Spaces:

ZiyiXia
/

leaderboard_test

Sleeping

App Files Files Community

ZiyiXia commited on 25 days ago

Commit

dad4aec

verified ·

1 Parent(s): 2cfcdca

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +1 -9

src/about.py CHANGED Viewed

@@ -39,17 +39,9 @@ LLM_BENCHMARKS_TEXT = f"""
 - **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
-**Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps:
-- 1) sample a screenshot *s*.
-- 2) prompt the MLLM to generate a question *q*.
-- 3) prompt the MLLM to generate the answer *a* for *q* based on *s*.
-The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
 - **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
-**Screenshot Retrieval (SR)** consists of evaluation samples, each comprising a textual query q and its relevant screenshot *s: (q, s)*. The retrieval model needs to precisely retrieve the relevant screenshot for a testing query from a given corpus *S*. Each evaluation sample is created in two steps:
-- 1) sample a screenshot *s*.
-- 2) prompt the LLM to generate a search query based on the caption of screenshot
-We consider seven tasks under this category, including product retrieval, paper retrieval, repo retrieval, news retrieval, chart retrieval, document retrieval, and slide retrieval.
 """
 EVALUATION_QUEUE_TEXT = """

 - **Composed Screenshot Retrieval (CSR)** is made up of sq2s triplets. Given a screenshot *s1* and a query *q* conditioned on *s1*, the retrieval model needs to retrieve the relevant screenshot *s2* from the corpus *S*. We define four tasks for this category, including product discovery, news-to-Wiki, knowledge relation, and Wiki-to-product. All tasks in this category are created by human annotators. For each task, annotators are instructed to identify relevant screenshot pairs and write queries to retrieve *s2* based on *s1*.
+- **Screenshot Question Answering (SQA)** comprises sq2a triplets. Given a screenshot s and a question q conditioned on s, the retrieval model needs to retrieve the correct answer a from a candidate corpus A. Each evaluation sample is created in three steps: 1) sample a screenshot *s*. 2) prompt the MLLM to generate a question *q*. 3) prompt the MLLM to generate the answer *a* for *q* based on *s*. The following tasks are included in this category: product-QA, news-QA, Wiki-QA, paper-QA, repo-QA.
 - **Open-Vocab Classification (OVC)** is performed using evaluation samples of screenshots and their textual class labels. Given a screenshot s and the label class *C*, the retrieval model needs to discriminate the correct label c from *C* based on the embedding similarity. We include the following tasks in this category: product classification, news-topic classification, academic-field classification, knowledge classification. For each task, we employ human labelers to create the label class and assign each screenshot with its correct label.
 """
 EVALUATION_QUEUE_TEXT = """