Spaces:

opll-org
/

Open-Persian-LLM-Leaderboard

Running

App Files Files Community

mohalisad commited on May 24

Commit

d0c6855

verified ·

1 Parent(s): 9638246

Update utils.py

Browse files

Files changed (1) hide show

utils.py +22 -7

utils.py CHANGED Viewed

@@ -115,7 +115,7 @@ table > tbody td:first-child {
 """
 LLM_BENCHMARKS_ABOUT_TEXT = f"""
-# Open Persian LLM Leaderboard (v1.0.0)
 > The Open Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
@@ -127,13 +127,28 @@ LLM_BENCHMARKS_ABOUT_TEXT = f"""
 >    The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
 >
 > 2. **Task Diversity**
->    Six specialized tasks have been curated for this leaderboard, each tailored to challenge different aspects of a model’s capabilities. These tasks include:
->    - **Part Multiple Choice**
->    - **ARC Easy**
->    - **ARC Challenge**
 >    - **MMLU Pro**
->    - **GSM Persian**
->    - **AUT Multiple Choice Persian**
 >
 >    Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
 >

 """
 LLM_BENCHMARKS_ABOUT_TEXT = f"""
+# Open Persian LLM Leaderboard (v2.0.0)
 > The Open Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
 >    The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
 >
 > 2. **Task Diversity**
+>    Over 20 specialized tasks have been curated for this leaderboard, each tailored to challenge different aspects of a model’s capabilities. These tasks include:
+>    - **GeneralKnowledge**
+>    - **GSM8K**
+>    - **DC-Homograph**
+>    - **MC-Homograph**
+>    - **PiQA**
+>    - **Proverb-Quiz**
+>    - **VerbEval**
+>    - **Winogrande**
+>    - **Arc-Challenge**
+>    - **Arc-Easy**
+>    - **Feqh**
+>    - **Hallucination (Truthfulness)**
+>    - **P-Hellaswag**
+>    - **Law**
+>    - **AUT Multiple Choice**
+>    - **Parsi Literature**
+>    - **BoolQA**
+>    - **Reading Comprehension**
+>    - **PartExpert**
 >    - **MMLU Pro**
+>    - **Iranian Social Norms**
 >
 >    Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
 >