Spaces:
Sleeping
Sleeping
Update src/about.py
Browse files- src/about.py +1 -9
src/about.py
CHANGED
@@ -39,17 +39,9 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
39 |
|
40 |
- **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
|
41 |
|
42 |
-
**Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps:
|
43 |
-
- 1) sample a screenshot *s*.
|
44 |
-
- 2) prompt the MLLM to generate a question *q*.
|
45 |
-
- 3) prompt the MLLM to generate the answer *a* for *q* based on *s*.
|
46 |
-
The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
|
47 |
|
48 |
- **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
|
49 |
-
**Screenshot Retrieval (SR)** consists of evaluation samples, each comprising a textual query q and its relevant screenshot *s: (q, s)*. The retrieval model needs to precisely retrieve the relevant screenshot for a testing query from a given corpus *S*. Each evaluation sample is created in two steps:
|
50 |
-
- 1) sample a screenshot *s*.
|
51 |
-
- 2) prompt the LLM to generate a search query based on the caption of screenshot
|
52 |
-
We consider seven tasks under this category, including product retrieval, paper retrieval, repo retrieval, news retrieval, chart retrieval, document retrieval, and slide retrieval.
|
53 |
"""
|
54 |
|
55 |
EVALUATION_QUEUE_TEXT = """
|
|
|
39 |
|
40 |
- **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
|
41 |
|
42 |
+
- **Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps: 1) sample a screenshot *s*. 2) prompt the MLLM to generate a question *q*. 3) prompt the MLLM to generate the answer *a* for *q* based on *s*. The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
|
|
|
|
|
|
|
|
|
43 |
|
44 |
- **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
|
|
|
|
|
|
|
|
|
45 |
"""
|
46 |
|
47 |
EVALUATION_QUEUE_TEXT = """
|