Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper โข 2512.01374 โข Published Dec 1, 2025 โข 93