Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,19 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# EvoEval: Evolving Coding Benchmarks via LLM
|
11 |
+
|
12 |
+
**EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems:
|
13 |
+
- ๐ฅ Containing **828** new problems across **5** ๐ semantic-altering and **2** โญ semantic-preserving benchmarks
|
14 |
+
- ๐ฎ Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
|
15 |
+
- ๐ Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline
|
16 |
+
- ๐ค Generated LLM code samples from **>50** different models to save you time in running experiments
|
17 |
+
|
18 |
+
<sup>1</sup> coincidentally similar pronunciation with ๐ EvilEval
|
19 |
+
|
20 |
+
- GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval)
|
21 |
+
- Webpage: [evo-eval.github.io](https://evo-eval.github.io/)
|
22 |
+
- Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html)
|
23 |
+
- Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html)
|
24 |
+
- Paper: TODO
|
25 |
+
- PyPI: [evoeval](https://pypi.org/project/evoeval/)
|