Spaces:
Running
on
Zero
Apply for community grant: Academic project (gpu and storage)
Hi HF team and community,
We are developing JudgeLRM (https://arxiv.org/abs/2504.00050), a new family of judgment-oriented Large Reasoning Models designed to serve as automatic judges in complex AI evaluation tasks. While Large Language Models (LLMs) have recently been adopted as scalable evaluators, conventional Supervised Fine-Tuning (SFT) still faces limitations when deep reasoning is required. JudgeLRM tackles this by leveraging reinforced learning with outcome-driven, judge-wise rewards, enabling more robust and reliable judgment in reasoning-intensive scenarios.
Our experiments show JudgeLRM consistently outperforms both SFT-tuned baselines and state-of-the-art reasoning models, with JudgeLRM-3B even surpassing GPT-4, and JudgeLRM-7B exceeding DeepSeek-R1 by 2.79% in F1 score—especially in tasks that demand sophisticated reasoning.
We are committed to making JudgeLRM fully open-source, including not just models and datasets, but also an interactive demo for the community to evaluate and use next-generation LLM judges first-hand.
To make JudgeLRM openly accessible and further push the boundaries of AI evaluation research, we are seeking GPU resources for pretraining, fine-tuning, and hosting the demo. We believe that with your support, JudgeLRM will inspire more advanced and transparent approaches in AI benchmarking.
Thanks for considering our request!