23 2

liyaxuan

lllyx

AI & ML interests

None yet

Recent Activity

upvoted a paper about 15 hours ago

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

updated a collection 2 days ago

Rethinking OPD

updated a dataset 2 days ago

lllyx/OpenThought3-Qwen3-4B

View all activity

Organizations

None yet

upvoted a paper about 15 hours ago

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Paper • 2605.13779 • Published 2 days ago • 137

upvoted a collection 3 days ago

Rethinking OPD

Collection

This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 4 items • Updated 2 days ago • 1

upvoted a paper 3 days ago

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper • 2605.08083 • Published 7 days ago • 63

upvoted 4 papers 4 days ago

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

Paper • 2604.28123 • Published 14 days ago • 47

upvoted 2 papers 11 days ago

MAIC-UI: Making Interactive Courseware with Generative UI

Paper • 2604.25806 • Published 17 days ago • 8

Co-Evolving Policy Distillation

Paper • 2604.27083 • Published 16 days ago • 64

upvoted a paper 20 days ago

Near-Future Policy Optimization

Paper • 2604.20733 • Published 23 days ago • 76

upvoted a paper 30 days ago

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published about 1 month ago • 94

upvoted 2 papers about 1 month ago

Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

Paper • 2601.06021 • Published Jan 9 • 48

Self-Distilled RLVR

Paper • 2604.03128 • Published Apr 3 • 173

upvoted a paper 3 months ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 66

upvoted a collection 3 months ago

UltraData

Collection

Ultra Scale, Ultra Quality, Ultra Coverage • 10 items • Updated 1 day ago • 82

upvoted a paper 3 months ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 111

upvoted a paper 4 months ago

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

upvoted an article 4 months ago

Article

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”

NormalUhr

•

Aug 11, 2025

• 11

upvoted a paper 4 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 231

upvoted a paper 5 months ago

Pre-training Distillation for Large Language Models: A Design Space Exploration

Paper • 2410.16215 • Published Oct 21, 2024 • 17

liyaxuan

AI & ML interests

Recent Activity

Organizations

lllyx's activity

Re-understanding KL Approximation from an RL-for-LLM Lens: Notes on “Approximating KL Divergence”