MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published 2 days ago • 137
Rethinking OPD Collection This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 4 items • Updated 2 days ago • 1
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 7 days ago • 63
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published 14 days ago • 47
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published 15 days ago • 68
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 8 days ago • 104
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published 8 days ago • 54
MAIC-UI: Making Interactive Courseware with Generative UI Paper • 2604.25806 • Published 17 days ago • 8
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published about 1 month ago • 94
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 48
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 66
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 111
view article Article Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence” NormalUhr • Aug 11, 2025 • 11
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 231
Pre-training Distillation for Large Language Models: A Design Space Exploration Paper • 2410.16215 • Published Oct 21, 2024 • 17