Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation Paper โข 2507.08441 โข Published Jul 11 โข 61
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper โข 2507.05255 โข Published Jul 7 โข 70
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper โข 2503.24290 โข Published Mar 31 โข 63
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper โข 2412.20631 โข Published Dec 30, 2024 โข 15
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper โข 2409.01704 โข Published Sep 3, 2024 โข 84
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper โข 2406.16855 โข Published Jun 24, 2024 โข 58
Small Language Model Meets with Reinforced Vision Vocabulary Paper โข 2401.12503 โข Published Jan 23, 2024 โข 33
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models Paper โข 2312.06109 โข Published Dec 11, 2023 โข 21
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper โข 2312.00589 โข Published Nov 30, 2023 โข 27
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper โข 2309.11499 โข Published Sep 20, 2023 โข 59