Spaces:

MCINext
/

mizan-llm-leaderboard

Running

App Files Files Community

mehran commited on May 28

Commit

91ff46d

1 Parent(s): 5d73401

update about

Browse files

Files changed (1) hide show

about.py +2 -2

about.py CHANGED Viewed

@@ -19,12 +19,12 @@ def render_about():
                 PersCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
                 """)
-            with gr.Accordion("2. IFEval-fa (Persian Instruction Following Evaluation)", open=False):
                 gr.Markdown("""
                 This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
                 """)
-            with gr.Accordion("3. MMLU-Fa (Persian Massive Multitask Language Understanding)", open=False):
                 gr.Markdown("""
                 MMLU-Fa is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
                 <ul>

                 PersCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
                 """)
+            with gr.Accordion("2. Persian IFEval (Persian Instruction Following Evaluation)", open=False):
                 gr.Markdown("""
                 This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
                 """)
+            with gr.Accordion("3. PerMMLU (Persian Massive Multitask Language Understanding)", open=False):
                 gr.Markdown("""
                 MMLU-Fa is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
                 <ul>