S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models Paper • 2310.15147 • Published Oct 23, 2023 • 2
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent Paper • 2402.13717 • Published Feb 21, 2024 • 3
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning Paper • 2403.02333 • Published Mar 4, 2024 • 1
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models Paper • 2410.07331 • Published Oct 9, 2024 • 5
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 48
GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks Paper • 2502.14848 • Published Feb 20 • 1
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning Paper • 2506.01710 • Published Jun 2 • 2
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 42
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 126