Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 20 days ago • 124
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models Paper • 2605.30263 • Published 26 days ago • 59
timm/mobilenetv3_small_100.lamb_in1k Image Classification • 2.55M • Updated Oct 19, 2025 • 17.4M • 79
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published May 21 • 171
SEIF: Self-Evolving Reinforcement Learning for Instruction Following Paper • 2605.07465 • Published May 8 • 30
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 123
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published Mar 25 • 183
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling Paper • 2603.25746 • Published Mar 26 • 155