Spaces:
Sleeping
Sleeping
update
Browse files- results.csv +10 -10
- src/about.py +12 -3
results.csv
CHANGED
@@ -1,13 +1,13 @@
|
|
1 |
Rank,Model,#Params (B),Overall,SR,CSR,SQA,OVC
|
2 |
1,UniSE-MLLM,2.21,55.72,69.63,54.49,43.2,48.26
|
3 |
-
2,GME,2.21,48.14,61.62,37.68,37.78,47.98
|
4 |
-
3,DSE,4.15,45.21,61.54,37.78,39.24,31.51
|
5 |
-
4,ColPali,2.92,43.64,61.73,35.0,35.32,31.04
|
6 |
5,UniSE-CLIP,0.428,36.41,35.95,43.38,28.13,40.62
|
7 |
-
6,MM-Embed,7.57,34.48,25.86,40.93,42.83,32.67
|
8 |
-
7,SigLIP,0.878,33.34,38.33,34.48,19.6,40.64
|
9 |
-
8,VLM2Vec,4.15,32.19,15.93,48.05,49.42,23.24
|
10 |
-
9,E5-V,8.35,25.13,34.11,26.59,5.23,32.85
|
11 |
-
10,CLIP,0.428,23.75,18.89,25.39,23.9,30.4
|
12 |
-
11,Uni-IR,0.428,19.63,12.35,35.92,29.68,20.06
|
13 |
-
12,VISTA,0.196,13.85,5.21,11.29,25.78,16.61
|
|
|
1 |
Rank,Model,#Params (B),Overall,SR,CSR,SQA,OVC
|
2 |
1,UniSE-MLLM,2.21,55.72,69.63,54.49,43.2,48.26
|
3 |
+
2,"<a href=""https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct"">GME</a>",2.21,48.14,61.62,37.68,37.78,47.98
|
4 |
+
3,"<a href=""https://huggingface.co/Tevatron/dse-phi3-v1.0"">DSE</a>",4.15,45.21,61.54,37.78,39.24,31.51
|
5 |
+
4,"<a href=""https://huggingface.co/vidore/colpali"">ColPali</a>",2.92,43.64,61.73,35.0,35.32,31.04
|
6 |
5,UniSE-CLIP,0.428,36.41,35.95,43.38,28.13,40.62
|
7 |
+
6,"<a href=""https://huggingface.co/nvidia/MM-Embed"">MM-Embed</a>",7.57,34.48,25.86,40.93,42.83,32.67
|
8 |
+
7,"<a href=""https://huggingface.co/google/siglip-so400m-patch14-384"">SigLIP</a>",0.878,33.34,38.33,34.48,19.6,40.64
|
9 |
+
8,"<a href=""https://huggingface.co/TIGER-Lab/VLM2Vec-Full"">VLM2Vec</a>",4.15,32.19,15.93,48.05,49.42,23.24
|
10 |
+
9,"<a href=""https://huggingface.co/royokong/e5-v"">E5-V</a>",8.35,25.13,34.11,26.59,5.23,32.85
|
11 |
+
10,"<a href=""https://huggingface.co/openai/clip-vit-large-patch14"">CLIP</a>",0.428,23.75,18.89,25.39,23.9,30.4
|
12 |
+
11,"<a href=""https://huggingface.co/TIGER-Lab/UniIR"">Uni-IR</a>",0.428,19.63,12.35,35.92,29.68,20.06
|
13 |
+
12,"<a href=""https://huggingface.co/OpenDriveLab/Vista"">VISTA</a>",0.196,13.85,5.21,11.29,25.78,16.61
|
src/about.py
CHANGED
@@ -29,7 +29,7 @@ INTRODUCTION_TEXT = """
|
|
29 |
|
30 |
More details can be found:
|
31 |
- Paper: https://arxiv.org/pdf/2502.11431
|
32 |
-
-
|
33 |
"""
|
34 |
|
35 |
# Which evaluations are you running? how can people reproduce what you have?
|
@@ -39,9 +39,17 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
39 |
|
40 |
- **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
43 |
|
44 |
- **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
|
|
|
|
|
|
|
|
|
45 |
"""
|
46 |
|
47 |
EVALUATION_QUEUE_TEXT = """
|
@@ -79,7 +87,8 @@ SUBMIT_FORM = """
|
|
79 |
```json
|
80 |
{
|
81 |
"Model": "<Model Name>",
|
82 |
-
"
|
|
|
83 |
"Overall": 30.00,
|
84 |
"SR": 30.00,
|
85 |
"CSR": 30.00,
|
|
|
29 |
|
30 |
More details can be found:
|
31 |
- Paper: https://arxiv.org/pdf/2502.11431
|
32 |
+
- Repo: https://github.com/VectorSpaceLab/Vis-IR
|
33 |
"""
|
34 |
|
35 |
# Which evaluations are you running? how can people reproduce what you have?
|
|
|
39 |
|
40 |
- **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
|
41 |
|
42 |
+
**Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps:
|
43 |
+
- 1) sample a screenshot *s*.
|
44 |
+
- 2) prompt the MLLM to generate a question *q*.
|
45 |
+
- 3) prompt the MLLM to generate the answer *a* for *q* based on *s*.
|
46 |
+
The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
|
47 |
|
48 |
- **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
|
49 |
+
**Screenshot Retrieval (SR)** consists of evaluation samples, each comprising a textual query q and its relevant screenshot *s: (q, s)*. The retrieval model needs to precisely retrieve the relevant screenshot for a testing query from a given corpus *S*. Each evaluation sample is created in two steps:
|
50 |
+
- 1) sample a screenshot *s*.
|
51 |
+
- 2) prompt the LLM to generate a search query based on the caption of screenshot
|
52 |
+
We consider seven tasks under this category, including product retrieval, paper retrieval, repo retrieval, news retrieval, chart retrieval, document retrieval, and slide retrieval.
|
53 |
"""
|
54 |
|
55 |
EVALUATION_QUEUE_TEXT = """
|
|
|
87 |
```json
|
88 |
{
|
89 |
"Model": "<Model Name>",
|
90 |
+
"URL (optional)": "<Model/Repo/Paper URL>"
|
91 |
+
"#params": "7.11B",
|
92 |
"Overall": 30.00,
|
93 |
"SR": 30.00,
|
94 |
"CSR": 30.00,
|