Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper โข 2505.03335 โข Published May 6 โข 182
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper โข 2410.18451 โข Published Oct 24, 2024 โข 20