tabedini commited on
Commit
d6708bf
·
verified ·
1 Parent(s): 66c91d7

Update utils.py

Browse files
Files changed (1) hide show
  1. utils.py +5 -29
utils.py CHANGED
@@ -12,45 +12,35 @@ custom_css = """
12
  body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-markdown {
13
  font-family: 'Vazirmatn', sans-serif !important;
14
  }
15
-
16
  .markdown-text {
17
  font-size: 16px !important;
18
  }
19
-
20
  #models-to-add-text {
21
  font-size: 18px !important;
22
  }
23
-
24
  #citation-button span {
25
  font-size: 16px !important;
26
  }
27
-
28
  #citation-button textarea {
29
  font-size: 16px !important;
30
  }
31
-
32
  #citation-button > label > button {
33
  margin: 6px;
34
  transform: scale(1.3);
35
  }
36
-
37
  #leaderboard-table {
38
  margin-top: 15px
39
  }
40
-
41
  #leaderboard-table-lite {
42
  margin-top: 15px
43
  }
44
-
45
  #search-bar-table-box > div:first-child {
46
  background: none;
47
  border: none;
48
  }
49
-
50
  #search-bar {
51
  padding: 0px;
52
  }
53
-
54
  /* Limit the width of the first AutoEvalColumn so that names don't expand too much */
55
  #leaderboard-table td:nth-child(2),
56
  #leaderboard-table th:nth-child(2) {
@@ -58,11 +48,9 @@ body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-ma
58
  overflow: auto;
59
  white-space: nowrap;
60
  }
61
-
62
  .tab-buttons button {
63
  font-size: 20px;
64
  }
65
-
66
  #scale-logo {
67
  border-style: none !important;
68
  box-shadow: none;
@@ -71,7 +59,6 @@ body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-ma
71
  margin-right: auto;
72
  max-width: 600px;
73
  }
74
-
75
  #scale-logo .download {
76
  display: none;
77
  }
@@ -111,13 +98,9 @@ body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-ma
111
 
112
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
113
  # Persian LLM Leaderboard (v1.0.0)
114
-
115
  > The Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
116
-
117
  > **Note:** This leaderboard is continuously updating its data and models, reflecting the latest developments in Persian LLMs. It is currently in version 1.0.0, serving as the initial benchmark for Persian LLM evaluation, with plans for future enhancements.
118
-
119
  ## 1. Key Features
120
-
121
  > 1. **Open Evaluation Access**
122
  > The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
123
  >
@@ -127,8 +110,8 @@ LLM_BENCHMARKS_ABOUT_TEXT = f"""
127
  > - **ARC Easy**
128
  > - **ARC Challenge**
129
  > - **MMLU Pro**
130
- > - **GSM8k Persian**
131
- > - **Multiple Choice Persian**
132
  >
133
  > Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
134
  >
@@ -136,28 +119,22 @@ LLM_BENCHMARKS_ABOUT_TEXT = f"""
136
  > A sample of the evaluation dataset is hosted on [Hugging Face Datasets](https://huggingface.co/datasets/PartAI/llm-leaderboard-datasets-sample), offering the AI community a glimpse of the benchmark content and format. This sample allows developers to pre-assess their models against representative data before a full leaderboard evaluation.
137
  >
138
  > 4. **Collaborative Development**
139
- > This leaderboard represents a significant collaboration between Part AI and Professor Saeedeh Momtazi of Amirkabir University of Technology, leveraging industrial expertise and academic research to create a high-quality, open benchmarking tool. The partnership underscores a shared commitment to advancing Persian LLMs.
140
  >
141
  > 5. **Comprehensive Evaluation Pipeline**
142
  > By integrating a standardized evaluation pipeline, models are assessed across a variety of data types, including text, mathematical formulas, and numerical data. This multi-faceted approach enhances the evaluation’s reliability and allows for precise, nuanced assessment of model performance across multiple dimensions.
143
-
144
  ## 2. Background and Goals
145
-
146
  > Recent months have seen a notable increase in the development of Persian LLMs by research centers and AI companies in Iran. However, the lack of reliable, standardized benchmarks for Persian LLMs has made it challenging to evaluate model quality comprehensively. Global benchmarks typically do not support Persian, resulting in skewed or unreliable results for Persian LLMs.
147
  >
148
  > This leaderboard addresses this gap by providing a locally-focused, transparent system that enables consistent, fair comparisons of Persian LLMs. It is expected to be a valuable tool for Persian-speaking businesses and developers, allowing them to select models best suited to their needs. Researchers and model developers also benefit from the competitive environment, with opportunities to showcase and improve their models based on benchmark rankings.
149
-
150
  ## 3. Data Privacy and Integrity
151
-
152
  > To maintain evaluation integrity and prevent overfitting or data leakage, only part of the benchmark dataset is openly available. This limited access approach upholds model evaluation reliability, ensuring that results are genuinely representative of each model’s capabilities across unseen data.
153
  >
154
  > The leaderboard represents a significant milestone in Persian LLMs and is positioned to become the leading standard for LLM evaluation in the Persian-speaking world.
155
-
156
  """
157
 
158
 
159
  LLM_BENCHMARKS_SUBMIT_TEXT = """## Submitting a Model for Evaluation
160
-
161
  > To submit your open-source model for evaluation, follow these steps:
162
  >
163
  > 1. **Ensure your model is on Hugging Face**: Your model must be publicly available on [Hugging Face](https://huggingface.co/).
@@ -225,8 +202,8 @@ def apply_markdown_format_for_columns(df, model_column_name):
225
  return df
226
 
227
 
228
- def submit(model_name, model_id, contact_email, license):
229
- if model_name == "" or model_id == "" or license == "" or contact_email == "":
230
  gr.Info("Please fill all the fields")
231
  return
232
 
@@ -241,7 +218,6 @@ def submit(model_name, model_id, contact_email, license):
241
  "model_name": model_name,
242
  "model_id": model_id,
243
  "contact_email": contact_email,
244
- "license": license
245
  }
246
 
247
  # Get the current timestamp to add to the filename
 
12
  body, .gradio-container, .gr-button, .gr-input, .gr-slider, .gr-dropdown, .gr-markdown {
13
  font-family: 'Vazirmatn', sans-serif !important;
14
  }
 
15
  .markdown-text {
16
  font-size: 16px !important;
17
  }
 
18
  #models-to-add-text {
19
  font-size: 18px !important;
20
  }
 
21
  #citation-button span {
22
  font-size: 16px !important;
23
  }
 
24
  #citation-button textarea {
25
  font-size: 16px !important;
26
  }
 
27
  #citation-button > label > button {
28
  margin: 6px;
29
  transform: scale(1.3);
30
  }
 
31
  #leaderboard-table {
32
  margin-top: 15px
33
  }
 
34
  #leaderboard-table-lite {
35
  margin-top: 15px
36
  }
 
37
  #search-bar-table-box > div:first-child {
38
  background: none;
39
  border: none;
40
  }
 
41
  #search-bar {
42
  padding: 0px;
43
  }
 
44
  /* Limit the width of the first AutoEvalColumn so that names don't expand too much */
45
  #leaderboard-table td:nth-child(2),
46
  #leaderboard-table th:nth-child(2) {
 
48
  overflow: auto;
49
  white-space: nowrap;
50
  }
 
51
  .tab-buttons button {
52
  font-size: 20px;
53
  }
 
54
  #scale-logo {
55
  border-style: none !important;
56
  box-shadow: none;
 
59
  margin-right: auto;
60
  max-width: 600px;
61
  }
 
62
  #scale-logo .download {
63
  display: none;
64
  }
 
98
 
99
  LLM_BENCHMARKS_ABOUT_TEXT = f"""
100
  # Persian LLM Leaderboard (v1.0.0)
 
101
  > The Persian LLM Evaluation Leaderboard, developed by **Part DP AI** in collaboration with **AUT (Amirkabir University of Technology) NLP Lab**, provides a comprehensive benchmarking system specifically designed for Persian LLMs. This leaderboard, based on the open-source [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), offers a unique platform for evaluating the performance of large language models (LLMs) on tasks that demand linguistic proficiency and technical skill in Persian.
 
102
  > **Note:** This leaderboard is continuously updating its data and models, reflecting the latest developments in Persian LLMs. It is currently in version 1.0.0, serving as the initial benchmark for Persian LLM evaluation, with plans for future enhancements.
 
103
  ## 1. Key Features
 
104
  > 1. **Open Evaluation Access**
105
  > The leaderboard allows open participation, meaning that developers and researchers working with open-source models can submit evaluation requests for their models. This accessibility encourages the development and testing of Persian LLMs within the broader AI ecosystem.
106
  >
 
110
  > - **ARC Easy**
111
  > - **ARC Challenge**
112
  > - **MMLU Pro**
113
+ > - **GSM Persian**
114
+ > - **AUT Multiple Choice Persian**
115
  >
116
  > Each dataset is available in Persian, providing a robust testing ground for models in a non-English setting. The datasets collectively contain over **40k samples** across various categories such as **Common Knowledge**, **Reasoning**, **Summarization**, **Math**, and **Specialized Examinations**, offering comprehensive coverage of diverse linguistic and technical challenges.
117
  >
 
119
  > A sample of the evaluation dataset is hosted on [Hugging Face Datasets](https://huggingface.co/datasets/PartAI/llm-leaderboard-datasets-sample), offering the AI community a glimpse of the benchmark content and format. This sample allows developers to pre-assess their models against representative data before a full leaderboard evaluation.
120
  >
121
  > 4. **Collaborative Development**
122
+ > This leaderboard represents a significant collaboration between Part AI and Professor Saeedeh Momtazi of Amirkabir University of Technology (with key contributions from [Shahriar Shariati](https://huggingface.co/shahriarshm), [Farhan Farsi](https://huggingface.co/FarhanFarsi) and [Shayan Bali](https://huggingface.co/shayanbali)), leveraging industrial expertise and academic research to create a high-quality, open benchmarking tool. The partnership underscores a shared commitment to advancing Persian LLMs.
123
  >
124
  > 5. **Comprehensive Evaluation Pipeline**
125
  > By integrating a standardized evaluation pipeline, models are assessed across a variety of data types, including text, mathematical formulas, and numerical data. This multi-faceted approach enhances the evaluation’s reliability and allows for precise, nuanced assessment of model performance across multiple dimensions.
 
126
  ## 2. Background and Goals
 
127
  > Recent months have seen a notable increase in the development of Persian LLMs by research centers and AI companies in Iran. However, the lack of reliable, standardized benchmarks for Persian LLMs has made it challenging to evaluate model quality comprehensively. Global benchmarks typically do not support Persian, resulting in skewed or unreliable results for Persian LLMs.
128
  >
129
  > This leaderboard addresses this gap by providing a locally-focused, transparent system that enables consistent, fair comparisons of Persian LLMs. It is expected to be a valuable tool for Persian-speaking businesses and developers, allowing them to select models best suited to their needs. Researchers and model developers also benefit from the competitive environment, with opportunities to showcase and improve their models based on benchmark rankings.
 
130
  ## 3. Data Privacy and Integrity
 
131
  > To maintain evaluation integrity and prevent overfitting or data leakage, only part of the benchmark dataset is openly available. This limited access approach upholds model evaluation reliability, ensuring that results are genuinely representative of each model’s capabilities across unseen data.
132
  >
133
  > The leaderboard represents a significant milestone in Persian LLMs and is positioned to become the leading standard for LLM evaluation in the Persian-speaking world.
 
134
  """
135
 
136
 
137
  LLM_BENCHMARKS_SUBMIT_TEXT = """## Submitting a Model for Evaluation
 
138
  > To submit your open-source model for evaluation, follow these steps:
139
  >
140
  > 1. **Ensure your model is on Hugging Face**: Your model must be publicly available on [Hugging Face](https://huggingface.co/).
 
202
  return df
203
 
204
 
205
+ def submit(model_name, model_id, contact_email):
206
+ if model_name == "" or model_id == "" or contact_email == "":
207
  gr.Info("Please fill all the fields")
208
  return
209
 
 
218
  "model_name": model_name,
219
  "model_id": model_id,
220
  "contact_email": contact_email,
 
221
  }
222
 
223
  # Get the current timestamp to add to the filename