MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills Paper • 2505.06176 • Published 4 days ago • 7
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published 2 days ago • 15
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Paper • 2505.07747 • Published 1 day ago • 46
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published 1 day ago • 53
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published 1 day ago • 17
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch Paper • 2505.03733 • Published 7 days ago • 15
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Paper • 2505.06111 • Published 4 days ago • 19
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant Paper • 2505.05467 • Published 5 days ago • 13
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published 6 days ago • 71
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Paper • 2505.04769 • Published 6 days ago • 7
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published 6 days ago • 127
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents Paper • 2505.03570 • Published 7 days ago • 7
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Paper • 2505.04606 • Published 6 days ago • 6
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published 7 days ago • 8
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published 6 days ago • 32
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer Paper • 2505.04622 • Published 6 days ago • 25