metadata
title: README
emoji: ๐ป
colorFrom: indigo
colorTo: indigo
sdk: static
pinned: false
EvoEval: Evolving Coding Benchmarks via LLM
EvoEval1 is a holistic benchmark suite created by evolving HumanEval problems:
- ๐ฅ Containing 828 new problems across 5 ๐ semantic-altering and 2 โญ semantic-preserving benchmarks
- ๐ฎ Allows evaluation/comparison across different dimensions and problem types (i.e., Difficult, Creative or Tool Use problems). See our visualization tool for ready-to-use comparison
- ๐ Complete with leaderboard, groundtruth solutions, robust testcases and evaluation scripts to easily fit into your evaluation pipeline
- ๐ค Generated LLM code samples from >50 different models to save you time in running experiments
1 coincidentally similar pronunciation with ๐ EvilEval
- GitHub: evo-eval/evoeval
- Webpage: evo-eval.github.io
- Leaderboard: evo-eval.github.io/leaderboard.html
- Visualization: evo-eval.github.io/visualization.html
- Paper: arXiv
- PyPI: evoeval