Post
163
𝗔𝗯𝘀𝗼𝗹𝘂𝘁𝗲 𝗭𝗲𝗿𝗼: 𝗟𝗟𝗠𝘀 𝗰𝗮𝗻 𝘁𝗿𝗮𝗶𝗻 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗮𝗻𝘆 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗱𝗮𝘁𝗮 🤯
Has the "data wall" just been breached?
Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though".
🤔 Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them?
Thus they created “Absolute Zero Reasoning” (AZR), an approach that removes any need for human curated data.
🎭 𝗗𝘂𝗮𝗹 𝗿𝗼𝗹𝗲𝘀:
‣ Proposer: Generates challenging but solvable coding tasks
‣ Solver: Attempts to solve those self-proposed tasks
🧪 𝗧𝗵𝗿𝗲𝗲 𝘁𝗮𝘀𝗸 𝘁𝘆𝗽𝗲𝘀: all types are defined as triplets of program, input and output
‣ Deduction: Give model an input and program, it must deduce the output
‣ Abduction: Give model an program and output, it must find the input that gave said output
‣ Induction: Synthesize a program from input/output pairs
Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning.
📊 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:
‣ AZR post-training creates a nice improvement on known models like Qwen2.5-7B
‣ Shows strong cross-domain transfer: coding ↔️ math reasoning
🧐 𝗢𝘁𝗵𝗲𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀:
‣ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning
‣ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed!
Paper here: Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2505.03335)
Has the "data wall" just been breached?
Recent RL paradigms often relied on a set of questions an answers that needs to be manually curated. Researchers from Tsinghua University went like "why though".
🤔 Indeed, why learn from question designed by a human teacher, when the model can start from their base knowledge and learn by experimenting in a code environment, proposing coding tasks themselves and trying to solve them?
Thus they created “Absolute Zero Reasoning” (AZR), an approach that removes any need for human curated data.
🎭 𝗗𝘂𝗮𝗹 𝗿𝗼𝗹𝗲𝘀:
‣ Proposer: Generates challenging but solvable coding tasks
‣ Solver: Attempts to solve those self-proposed tasks
🧪 𝗧𝗵𝗿𝗲𝗲 𝘁𝗮𝘀𝗸 𝘁𝘆𝗽𝗲𝘀: all types are defined as triplets of program, input and output
‣ Deduction: Give model an input and program, it must deduce the output
‣ Abduction: Give model an program and output, it must find the input that gave said output
‣ Induction: Synthesize a program from input/output pairs
Btw this reminded me of my long-forgotten philosophy classes: Aristotle was more on the induction side, learning from real-world analogies, while Plato was more on the deduction side, trying to progress quite far with just one input and his reasoning.
📊 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:
‣ AZR post-training creates a nice improvement on known models like Qwen2.5-7B
‣ Shows strong cross-domain transfer: coding ↔️ math reasoning
🧐 𝗢𝘁𝗵𝗲𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀:
‣ Having a better base performance (general or code specific) amplify the gains from Absolute Zero Reasoning
‣ Researchers warn about "Uh-oh moments" (winking to the "aha moments" of DeepSeek) where the model generates concerning goals like "make an extremely convoluted code to outsmart all these humans": so supervision is still needed!
Paper here: Absolute Zero: Reinforced Self-play Reasoning with Zero Data (2505.03335)