Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation Paper • 2505.09027 • Published May 13
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published Sep 19, 2024 • 6
WebApp1K: A Practical Code-Generation Benchmark for Web App Development Paper • 2408.00019 • Published Jul 30, 2024 • 1
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published Sep 8, 2024 • 7