Spaces:

evoeval
/

README

Running

nevetsaix commited on Apr 7, 2024

Commit

f5d2106

verified ·

1 Parent(s): 4b274ed

small update

Files changed (1) hide show

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ pinned: false
 # EvoEval: Evolving Coding Benchmarks via LLM
 **EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems:
-- 🔥 Containing **828** new problems across **5** 🌠 semantic-altering and **2** ⭐ semantic-preserving benchmarks
 - 🔮 Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
 - 🏆 Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline
 - 🤖 Generated LLM code samples from **>50** different models to save you time in running experiments

 # EvoEval: Evolving Coding Benchmarks via LLM
 **EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems:
+- 🔥 Contains **828** new problems across **5** 🌠 semantic-altering and **2** ⭐ semantic-preserving benchmarks
 - 🔮 Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
 - 🏆 Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline
 - 🤖 Generated LLM code samples from **>50** different models to save you time in running experiments