Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning Paper • 2509.25534 • Published Sep 19 • 2
MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework Paper • 2508.14880 • Published Aug 20 • 15
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment Paper • 2508.07750 • Published Aug 11 • 19