SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories Paper • 2512.17419 • Published 6 days ago • 9
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models 9 days ago • 97
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 23 days ago • 229
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 24 days ago • 93
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated 1 day ago • 46
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published about 1 month ago • 60
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published Aug 5 • 59
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Paper • 2510.12693 • Published Oct 14 • 26
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 51
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 190
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents Paper • 2509.06501 • Published Sep 8 • 79
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published Aug 15 • 8
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20 • 39