Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published 8 days ago • 15
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning Paper • 2511.02805 • Published Nov 4, 2025
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published 8 days ago • 15
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection Paper • 2509.04460 • Published Aug 28, 2025 • 3
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection Paper • 2509.04460 • Published Aug 28, 2025 • 3
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch Paper • 2506.03558 • Published Jun 4, 2025 • 5 • 1
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 144
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published Aug 3, 2025 • 21
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published Aug 3, 2025 • 21 • 5
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published Aug 3, 2025 • 21
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides Paper • 2501.03936 • Published Jan 7, 2025 • 23
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch Paper • 2506.03558 • Published Jun 4, 2025 • 5
ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch Paper • 2506.03558 • Published Jun 4, 2025 • 5