Spaces:
Running
Running
mehran
commited on
Commit
·
91ff46d
1
Parent(s):
5d73401
update about
Browse files
about.py
CHANGED
@@ -19,12 +19,12 @@ def render_about():
|
|
19 |
PersCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
|
20 |
""")
|
21 |
|
22 |
-
with gr.Accordion("2. IFEval
|
23 |
gr.Markdown("""
|
24 |
This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
|
25 |
""")
|
26 |
|
27 |
-
with gr.Accordion("3.
|
28 |
gr.Markdown("""
|
29 |
MMLU-Fa is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
|
30 |
<ul>
|
|
|
19 |
PersCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
|
20 |
""")
|
21 |
|
22 |
+
with gr.Accordion("2. Persian IFEval (Persian Instruction Following Evaluation)", open=False):
|
23 |
gr.Markdown("""
|
24 |
This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
|
25 |
""")
|
26 |
|
27 |
+
with gr.Accordion("3. PerMMLU (Persian Massive Multitask Language Understanding)", open=False):
|
28 |
gr.Markdown("""
|
29 |
MMLU-Fa is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
|
30 |
<ul>
|