Spaces:

PKU-Alignment
/

EvalAnything-LeaderBoard

Running

App Files Files Community

EvalAnything-LeaderBoard / src /about.py

XuyaoWang

wip

bd48fbc 6 months ago

raw

history blame

5.24 kB

	from dataclasses import dataclass
	from enum import Enum

	@dataclass
	class Task:
	benchmark: str
	metric: str
	col_name: str


	# Select your tasks here
	# ---------------------------------------------------
	class Tasks(Enum):
	# task_key in the json file, metric_key in the json file, name to display in the leaderboard
	task0 = Task("anli_r1", "acc", "ANLI")
	task1 = Task("logiqa", "acc_norm", "LogiQA")

	NUM_FEWSHOT = 0 # Change with your few shot
	# ---------------------------------------------------



	# Your leaderboard name
	TITLE = """<h1 align="center" id="space-title">Eval-Anything Leaderboard</h1>"""

	# MJB_LOGO = '<img src="" alt="Logo" style="width: 100%; display: block; margin: auto;">'

	# What does your leaderboard evaluate?
	INTRODUCTION_TEXT = """
	Eval-anything is a framework designed specifically for evaluating all-modality models, and it is a part of the [Align-Anything](https://github.com/PKU-Alignment/align-anything) framework. It consists of two main tasks: All-Modality Understanding (AMU) and All-Modality Generation (AMG). AMU assesses a model's ability to simultaneously process and integrate information from all modalities, including text, images, audio, and video. On the other hand, AMG evaluates a model's capability to autonomously select output modalities based on user instructions and synergistically utilize different modalities to generate output. Eval-anything aims to comprehensively assess the ability of all-modality models to handle heterogeneous data from multiple sources, providing a reliable evaluation tool for this field.

	Note: Since most current open-source models lack support for all-modality output, (†) indicates that models are used as agents to invoke [AudioLDM2-Large](https://huggingface.co/cvssp/audioldm2-large) and [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) for audio and image generation.
	"""

	# Which evaluations are you running? how can people reproduce what you have?
	LLM_BENCHMARKS_TEXT = f"""
	"""

	EVALUATION_QUEUE_TEXT = """
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = """
	@misc{align_anything,
	author = {PKU-Alignment Team},
	title = {Align Anything: training all modality models to follow instructions with unified language feedback},
	year = {2024},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\\url{https://github.com/PKU-Alignment/align-anything}},
	}
	"""


	ABOUT_TEXT = """
	"""

	SUBMISSION_TEXT = """
	<h1 align="center">
	How to submit models/results to the leaderboard?
	</h1>
	We welcome the community to submit evaluation results for new models. These results will be added as non-verified. However, the authors are required to upload their generations in case other members want to verify the results.

	### 1 - Running Evaluation 🏃‍♂️

	We have written a detailed guide for running the evaluation on your model. You can find it in the `[align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/benchmarks/leaderboard)`. This process will generate a JSON file and a zip file summarizing the results, along with the raw generations and metric files.

	### 2 - Submitting Results 🚀

	To submit your results create a Pull Request in the community tab to add them under the [folder](hhttps://huggingface.co/spaces/PKU-Alignment/EvalAnything-LeaderBoard/tree/main/community_results) `community_results` in this repository:
	- Create a folder named `ORG_MODELNAME_USERNAME`. For example `PKU-Alignment_gemini1.5-pro_XuyaoWang`
	- Place your JSON file and ZIP file with grouped scores from the guide, along with the generations folder and metrics folder, inside this newly created folder.

	The title of the PR should be `[Community Submission] Model: org/model, Username: your_username`, replace org and model with those corresponding to the model you evaluated.

	### 3 - Getting your model verified ✅
	A verified result in Eval-Anything indicates that a core maintainer has decoded the outputs from the model and performed the evaluation. To have your model verified, please follow these steps:

	1. Email us and provide a brief rationale for why your model should be verified.
	2. Await our response and approval before proceeding.
	3. Prepare a script to decode from your model that does not require a GPU. Typically, this should be the same script used for your model contribution. It should run without requiring a local GPU. It should run without requiring a local GPU. We strongly recommend that you modify the scripts in [align-anything](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/benchmarks/leaderboard) to adapt to your model's operation.
	4. Generate temporary OpenAI API keys for running the script and share them with us. Specifically, we need the keys for evaluation.
	5. We will check and execute your script, update the results, and inform you so that you can revoke the temporary keys.

	Please note that we will not re-evaluate the same model. Due to sampling variance, the results might slightly differ from your initial ones. We will replace your previous community results with the verified ones.
	"""