Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models Paper • 2508.02120 • Published 10 days ago • 16
Attention Basin: Why Contextual Position Matters in Large Language Models Paper • 2508.05128 • Published 7 days ago • 2
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published 19 days ago • 29
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1 • 73
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published Jun 17 • 40
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 69
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13 • 24
Hidden in plain sight: VLMs overlook their visual representations Paper • 2506.08008 • Published Jun 9 • 8
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published Jun 10 • 28
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead Paper • 2504.00294 • Published Mar 31 • 11
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24 • 121
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking Paper • 2503.19855 • Published Mar 25 • 29
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27 • 63
Where do Large Vision-Language Models Look at when Answering Questions? Paper • 2503.13891 • Published Mar 18 • 8
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models Paper • 2503.03962 • Published Mar 5 • 4
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation Paper • 2503.02972 • Published Mar 4 • 25
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users Paper • 2503.02268 • Published Mar 4 • 11