Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs Paper • 2510.16062 • Published Oct 17, 2025 • 1
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization Paper • 2505.16640 • Published May 22, 2025 • 3
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks Paper • 2505.16459 • Published May 22, 2025 • 45
Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities Paper • 2503.11074 • Published Mar 14, 2025 • 2
Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios Paper • 2505.17735 • Published May 23, 2025 • 3