ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models Paper • 2310.10505 • Published Oct 16, 2023 • 3
Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning Paper • 2512.06533 • Published 19 days ago • 6
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 24 days ago • 53
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23 • 274
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9 • 51
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published Oct 23 • 23
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems Paper • 2510.11652 • Published Oct 13 • 28
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training Paper • 2510.04996 • Published Oct 6 • 15
BroRL: Scaling Reinforcement Learning via Broadened Exploration Paper • 2510.01180 • Published Oct 1 • 18
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published Sep 30 • 47
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published Aug 14 • 19
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published Jul 31 • 114