Submitted by akhaliq 137 DAPO: An Open-Source LLM Reinforcement Learning System at Scale · 35 authors 5
Submitted by nebulae09 49 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM · 12 authors 22 2
Submitted by carboncoo 32 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding · 8 authors 65 2
Submitted by ZhaoyangLyu 30 Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation · 12 authors 168 2
Submitted by yifanzhang114 26 Aligning Multimodal LLM with Human Preference: A Survey · 17 authors 16k 3
Submitted by cckevinn 26 CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era · 10 authors 53 2
Submitted by akhaliq 20 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control · 39 authors 2
Submitted by zhangysk 15 FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis · 9 authors 111 2
Submitted by kumarkrishna 12 Atlas: Multi-Scale Attention Improves Long Context Image Modeling · 9 authors 13 2
Submitted by Lingaaaaaaa 11 Temporal Consistency for LLM Reasoning Process Error Identification · 7 authors 2
Submitted by kpzhang996 11 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification · 9 authors 2
Submitted by BestWishYsh 10 Concat-ID: Towards Universal Identity-Preserving Video Synthesis · 5 authors 56 2
Submitted by jacklishufan 9 Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection · 7 authors 2
Submitted by edaxberger 7 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs · 11 authors 4
Submitted by PengDa02 7 Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs · 9 authors 31 3
Submitted by Spravil 7 Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models · 3 authors 2
Submitted by kpzhang996 5 PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models · 11 authors 2
Submitted by ZhiyuanZeng 5 EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees · 4 authors 23 2
Submitted by zhuoyanxu 4 Learning to Inference Adaptively for Multimodal Large Language Models · 7 authors 8 2
Submitted by DamianBoborzi 3 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling · 6 authors 7 2
Submitted by Mingtongz 3 KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation · 3 authors 19 2
Submitted by yuwendu 3 RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation · 9 authors 2
Submitted by cxliu0314 2 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving · 5 authors 2