GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 212
AdaptThink: Reasoning Models Can Learn When to Think Paper • 2505.13417 • Published May 19 • 82
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 51
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models Paper • 2406.16714 • Published Jun 24, 2024 • 10
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training Paper • 2311.04155 • Published Nov 7, 2023 • 1