ZiyiXia commited on
Commit
dad4aec
·
verified ·
1 Parent(s): 2cfcdca

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +1 -9
src/about.py CHANGED
@@ -39,17 +39,9 @@ LLM_BENCHMARKS_TEXT = f"""
39
 
40
  - **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
41
 
42
- **Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps:
43
- - 1) sample a screenshot *s*.
44
- - 2) prompt the MLLM to generate a question *q*.
45
- - 3) prompt the MLLM to generate the answer *a* for *q* based on *s*.
46
- The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
47
 
48
  - **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
49
- **Screenshot Retrieval (SR)** consists of evaluation samples, each comprising a textual query q and its relevant screenshot *s: (q, s)*. The retrieval model needs to precisely retrieve the relevant screenshot for a testing query from a given corpus *S*. Each evaluation sample is created in two steps:
50
- - 1) sample a screenshot *s*.
51
- - 2) prompt the LLM to generate a search query based on the caption of screenshot
52
- We consider seven tasks under this category, including product retrieval, paper retrieval, repo retrieval, news retrieval, chart retrieval, document retrieval, and slide retrieval.
53
  """
54
 
55
  EVALUATION_QUEUE_TEXT = """
 
39
 
40
  - **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
41
 
42
+ - **Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps: 1) sample a screenshot *s*. 2) prompt the MLLM to generate a question *q*. 3) prompt the MLLM to generate the answer *a* for *q* based on *s*. The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
 
 
 
 
43
 
44
  - **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
 
 
 
 
45
  """
46
 
47
  EVALUATION_QUEUE_TEXT = """