SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper β’ 2605.23904 β’ Published 5 days ago β’ 170
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper β’ 2605.11609 β’ Published 15 days ago β’ 191
RLPR: Extrapolating RLVR to General Domains without Verifiers Paper β’ 2506.18254 β’ Published Jun 23, 2025 β’ 35
Reinforcement-aware Knowledge Distillation for LLM Reasoning Paper β’ 2602.22495 β’ Published Feb 26 β’ 5
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper β’ 2602.01058 β’ Published Feb 1 β’ 45
Running 344 LLM Embeddings Explained: A Visual and Intuitive Guide π 344 How Language Models Turn Text into Meaning, From Traditional