|
--- |
|
title: DeepResearch Bench |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: indigo |
|
sdk: gradio |
|
sdk_version: 5.31.0 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
|
|
# DeepResearch Bench |
|
|
|
**DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents** |
|
|
|
This application showcases comprehensive evaluation results for Deep Research Agents. The app includes: |
|
|
|
- π **Leaderboard** - View overall performance metrics across all evaluated models |
|
- π **Data Viewer** - Explore detailed results for individual research tasks |
|
- π **Side-by-Side Comparison** - Compare different models' responses to the same research questions |
|
|
|
Visit our [project website](https://deepresearch-bench.github.io) for more information. |
|
|
|
## Citation |
|
```bibtex |
|
@article{du2025deepresearch, |
|
author = {Mingxuan Du and Benfeng Xu and Chiwei Zhu and Xiaorui Wang and Zhendong Mao}, |
|
title = {DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents}, |
|
journal = {arXiv preprint}, |
|
year = {2025}, |
|
} |
|
``` |
|
|
|
## Hugging Face Space Details |
|
- SDK: Gradio |
|
- SDK Version: 3.50.0 |