Spaces:

MCINext
/

mizan-llm-leaderboard

Running

App Files Files Community

mehran commited on 27 days ago

Commit

145eaf4

1 Parent(s): 0babe14

add links

Browse files

Files changed (5) hide show

.gradio/certificate.pem +31 -0
__pycache__/about.cpython-310.pyc +0 -0
__pycache__/submission.cpython-310.pyc +0 -0
about.py +19 -7
leaderboard/__pycache__/leaderboard.cpython-310.pyc +0 -0

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

__pycache__/about.cpython-310.pyc CHANGED Viewed

Binary files a/__pycache__/about.cpython-310.pyc and b/__pycache__/about.cpython-310.pyc differ

__pycache__/submission.cpython-310.pyc CHANGED Viewed

Binary files a/__pycache__/submission.cpython-310.pyc and b/__pycache__/submission.cpython-310.pyc differ

about.py CHANGED Viewed

@@ -17,22 +17,28 @@ def render_about():
             with gr.Accordion("1. PerCoR (Persian Commonsense Reasoning)", open=False):
                 gr.Markdown("""
                 PerCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
-                """)
             with gr.Accordion("2. Persian IFEval (Persian Instruction Following Evaluation)", open=False):
                 gr.Markdown("""
                 This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
-                """)
             with gr.Accordion("3. PerMMLU (Persian Massive Multitask Language Understanding)", open=False):
                 gr.Markdown("""
-                MMLU-Fa is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
                 <ul>
                     <li><strong>SPK (School Persian Knowledge):</strong> Contains 5,581 multiple-choice questions from the official Iranian school curriculum (grades 4-12) across 78 diverse subjects. Data was collected from the "Paadars" educational website and subsequently cleaned.</li>
                     <li><strong>UPK (University Persian Knowledge):</strong> Includes 7,793 multiple-choice questions from Master's and PhD entrance exams across 25 academic disciplines (e.g., medicine, engineering, humanities, arts). This data was extracted from exam booklets using OCR technology and cleaned by LLMs.</li>
                     <li><strong>GPK (General Persian Knowledge):</strong> Consists of 1,003 multiple-choice questions on 15 topics related to general knowledge specific to Iranian society (e.g., city souvenirs, religious edicts, national laws, famous personalities, cultural idioms). This data was generated using LLMs with specific prompts and reviewed by humans.</li>
                 </ul>
-                """)
             with gr.Accordion("4. Persian MT-Bench (Persian Multi-Turn Benchmark)", open=False):
                 gr.Markdown("""
@@ -41,7 +47,9 @@ def render_about():
                     <li><strong>Native Iranian Knowledge:</strong> Questions about cultural topics such as films, actors, and Iranian figures.</li>
                     <li><strong>Chat-Retrieval:</strong> Involves a multi-turn dialogue where the model must extract a relevant question and answer based on the user's needs.</li>
                 </ul>
-                """)
             with gr.Accordion("5. Persian NLU (Persian Natural Language Understanding)", open=False):
                 gr.Markdown("""
@@ -56,7 +64,9 @@ def render_about():
                     <li><strong>Extractive Question Answering (EQA):</strong> PQuAD</li>
                     <li><strong>Keyword Extraction:</strong> Synthetic Persian Keywords</li>
                 </ul>
-                """)
             with gr.Accordion("6. Persian NLG (Persian Natural Language Generation)", open=False):
                 gr.Markdown("""
@@ -67,7 +77,9 @@ def render_about():
                     <li><strong>Question Generation:</strong> PersianQA</li>
                 </ul>
                 The goal is to assess the generative capabilities of models.
-                """)
             # with gr.Accordion("7. BoolQ-fa (Persian Boolean Question Answering)", open=False):
             #     gr.Markdown("""

             with gr.Accordion("1. PerCoR (Persian Commonsense Reasoning)", open=False):
                 gr.Markdown("""
                 PerCoR is the first large-scale Persian benchmark for evaluating models' ability in **commonsense reasoning** through multi-choice sentence completion. It includes over 106,000 samples from diverse domains such as news, religion, and lifestyle, extracted from more than 40 Persian websites. Innovative methods like "segmentation by conjunctions" were used to create coherent and diverse sentences and options, while the DRESS-AF technique helped generate challenging, human-solvable distractors.
+                [link](https://huggingface.co/datasets/MCINext/percor)
+                            """)
             with gr.Accordion("2. Persian IFEval (Persian Instruction Following Evaluation)", open=False):
                 gr.Markdown("""
                 This dataset is a Persian-adapted and localized version of **IFEval**, assessing models' proficiency in **accurately executing complex, multi-step instructions (Instruction Following)**. The translation process involved a hybrid machine-human approach, with prompts unsuitable for the Persian language being rewritten or removed.
+                [link](https://huggingface.co/datasets/MCINext/persian-ifeval)
+                            """)
             with gr.Accordion("3. PerMMLU (Persian Massive Multitask Language Understanding)", open=False):
                 gr.Markdown("""
+                PerMMLU is an expanded and localized version of the renowned **MMLU** benchmark, designed to measure **general and specialized knowledge** of models in Persian. Tailored to cover knowledge at various levels and relevant to the Iranian cultural context, it comprises three main sub-datasets:
                 <ul>
                     <li><strong>SPK (School Persian Knowledge):</strong> Contains 5,581 multiple-choice questions from the official Iranian school curriculum (grades 4-12) across 78 diverse subjects. Data was collected from the "Paadars" educational website and subsequently cleaned.</li>
                     <li><strong>UPK (University Persian Knowledge):</strong> Includes 7,793 multiple-choice questions from Master's and PhD entrance exams across 25 academic disciplines (e.g., medicine, engineering, humanities, arts). This data was extracted from exam booklets using OCR technology and cleaned by LLMs.</li>
                     <li><strong>GPK (General Persian Knowledge):</strong> Consists of 1,003 multiple-choice questions on 15 topics related to general knowledge specific to Iranian society (e.g., city souvenirs, religious edicts, national laws, famous personalities, cultural idioms). This data was generated using LLMs with specific prompts and reviewed by humans.</li>
                 </ul>
+                [link](https://huggingface.co/datasets/MCINext/permmlu)
+                            """)
             with gr.Accordion("4. Persian MT-Bench (Persian Multi-Turn Benchmark)", open=False):
                 gr.Markdown("""
                     <li><strong>Native Iranian Knowledge:</strong> Questions about cultural topics such as films, actors, and Iranian figures.</li>
                     <li><strong>Chat-Retrieval:</strong> Involves a multi-turn dialogue where the model must extract a relevant question and answer based on the user's needs.</li>
                 </ul>
+                [link](https://huggingface.co/datasets/MCINext/persian-mt-bench)
+                            """)
             with gr.Accordion("5. Persian NLU (Persian Natural Language Understanding)", open=False):
                 gr.Markdown("""
                     <li><strong>Extractive Question Answering (EQA):</strong> PQuAD</li>
                     <li><strong>Keyword Extraction:</strong> Synthetic Persian Keywords</li>
                 </ul>
+                [link](https://huggingface.co/datasets/MCINext/persian-nlu)
+                            """)
             with gr.Accordion("6. Persian NLG (Persian Natural Language Generation)", open=False):
                 gr.Markdown("""
                     <li><strong>Question Generation:</strong> PersianQA</li>
                 </ul>
                 The goal is to assess the generative capabilities of models.
+                [link](https://huggingface.co/datasets/MCINext/persian-nlg)
+                            """)
             # with gr.Accordion("7. BoolQ-fa (Persian Boolean Question Answering)", open=False):
             #     gr.Markdown("""

leaderboard/__pycache__/leaderboard.cpython-310.pyc CHANGED Viewed

Binary files a/leaderboard/__pycache__/leaderboard.cpython-310.pyc and b/leaderboard/__pycache__/leaderboard.cpython-310.pyc differ