mohalisad commited on
Commit
d0c6855
·
verified ·
1 Parent(s): 9638246

Update utils.py

Browse files
Files changed (1) hide show
  1. utils.py +22 -7
utils.py CHANGED
@@ -115,7 +115,7 @@ table > tbody td:first-child {
115
  """
116
 
117
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
118
- # Open Persian LLM Leaderboard (v1.0.0)
119
 
120
  > The Open Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
121
 
@@ -127,13 +127,28 @@ LLM_BENCHMARKS_ABOUT_TEXT = f"""
127
  > The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
128
  >
129
  > 2. **Task Diversity**
130
- > Six specialized tasks have been curated for this leaderboard, each tailored to challenge different aspects of a model’s capabilities. These tasks include:
131
- > - **Part Multiple Choice**
132
- > - **ARC Easy**
133
- > - **ARC Challenge**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  > - **MMLU Pro**
135
- > - **GSM Persian**
136
- > - **AUT Multiple Choice Persian**
137
  >
138
  > Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
139
  >
 
115
  """
116
 
117
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
118
+ # Open Persian LLM Leaderboard (v2.0.0)
119
 
120
  > The Open Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
121
 
 
127
  > The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
128
  >
129
  > 2. **Task Diversity**
130
+ > Over 20 specialized tasks have been curated for this leaderboard, each tailored to challenge different aspects of a model’s capabilities. These tasks include:
131
+ > - **GeneralKnowledge**
132
+ > - **GSM8K**
133
+ > - **DC-Homograph**
134
+ > - **MC-Homograph**
135
+ > - **PiQA**
136
+ > - **Proverb-Quiz**
137
+ > - **VerbEval**
138
+ > - **Winogrande**
139
+ > - **Arc-Challenge**
140
+ > - **Arc-Easy**
141
+ > - **Feqh**
142
+ > - **Hallucination (Truthfulness)**
143
+ > - **P-Hellaswag**
144
+ > - **Law**
145
+ > - **AUT Multiple Choice**
146
+ > - **Parsi Literature**
147
+ > - **BoolQA**
148
+ > - **Reading Comprehension**
149
+ > - **PartExpert**
150
  > - **MMLU Pro**
151
+ > - **Iranian Social Norms**
 
152
  >
153
  > Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
154
  >