PFPO - a chitanda Collection

chitanda 's Collections

PFPO

PFPO

updated Feb 6

Resources for the paper Preference Optimization for Reasoning with Pseudo Feedback (ICLR 2025)

Preference Optimization for Reasoning with Pseudo Feedback

Paper • 2411.16345 • Published Nov 25, 2024 • 1
chitanda/mathscale4o-800k

Viewer • Updated Feb 6 • 492k • 34 • 1
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

Paper • 2402.00658 • Published Feb 1, 2024
chitanda/code-synthetic-test-cases

Preview • Updated Feb 6 • 7