nevetsaix commited on
Commit
f8d1e21
ยท
verified ยท
1 Parent(s): 106df81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -7,4 +7,19 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # EvoEval: Evolving Coding Benchmarks via LLM
11
+
12
+ **EvoEval**<sup>1</sup> is a holistic benchmark suite created by _evolving_ **HumanEval** problems:
13
+ - ๐Ÿ”ฅ Containing **828** new problems across **5** ๐ŸŒ  semantic-altering and **2** โญ semantic-preserving benchmarks
14
+ - ๐Ÿ”ฎ Allows evaluation/comparison across different **dimensions** and problem **types** (i.e., _Difficult_, _Creative_ or _Tool Use_ problems). See our [**visualization tool**](https://evo-eval.github.io/visualization.html) for ready-to-use comparison
15
+ - ๐Ÿ† Complete with [**leaderboard**](https://evo-eval.github.io/leaderboard.html), **groundtruth solutions**, **robust testcases** and **evaluation scripts** to easily fit into your evaluation pipeline
16
+ - ๐Ÿค– Generated LLM code samples from **>50** different models to save you time in running experiments
17
+
18
+ <sup>1</sup> coincidentally similar pronunciation with ๐Ÿ˜ˆ EvilEval
19
+
20
+ - GitHub: [evo-eval/evoeval](https://github.com/evo-eval/evoeval)
21
+ - Webpage: [evo-eval.github.io](https://evo-eval.github.io/)
22
+ - Leaderboard: [evo-eval.github.io/leaderboard.html](https://evo-eval.github.io/leaderboard.html)
23
+ - Visualization: [evo-eval.github.io/visualization.html](https://evo-eval.github.io/visualization.html)
24
+ - Paper: TODO
25
+ - PyPI: [evoeval](https://pypi.org/project/evoeval/)