Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published 9 days ago • 22
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published 8 days ago • 27
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published 16 days ago • 35
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published 14 days ago • 86
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Paper • 2505.00212 • Published 13 days ago • 5
Multi-Agent System for Comprehensive Soccer Understanding Paper • 2505.03735 • Published 7 days ago • 20
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published 8 days ago • 23
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published 8 days ago • 87
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published 8 days ago • 127
COSMOS: Predictable and Cost-Effective Adaptation of LLMs Paper • 2505.01449 • Published 14 days ago • 2
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey Paper • 2505.03418 • Published 8 days ago • 8
LLM-Independent Adaptive RAG: Let the Question Speak for Itself Paper • 2505.04253 • Published 7 days ago • 11
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published 11 days ago • 22
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published 6 days ago • 55
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published 9 days ago • 67
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published 7 days ago • 14
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published 12 days ago • 24