README / README.md
nevetsaix's picture
update paper link
4b274ed verified
|
raw
history blame
1.43 kB
metadata
title: README
emoji: ๐Ÿ˜ป
colorFrom: indigo
colorTo: indigo
sdk: static
pinned: false

EvoEval: Evolving Coding Benchmarks via LLM

EvoEval1 is a holistic benchmark suite created by evolving HumanEval problems:

  • ๐Ÿ”ฅ Containing 828 new problems across 5 ๐ŸŒ  semantic-altering and 2 โญ semantic-preserving benchmarks
  • ๐Ÿ”ฎ Allows evaluation/comparison across different dimensions and problem types (i.e., Difficult, Creative or Tool Use problems). See our visualization tool for ready-to-use comparison
  • ๐Ÿ† Complete with leaderboard, groundtruth solutions, robust testcases and evaluation scripts to easily fit into your evaluation pipeline
  • ๐Ÿค– Generated LLM code samples from >50 different models to save you time in running experiments

1 coincidentally similar pronunciation with ๐Ÿ˜ˆ EvilEval