Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper β’ 2505.03335 β’ Published 7 days ago β’ 124
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper β’ 2504.10479 β’ Published 29 days ago β’ 255
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 11 items β’ Updated 15 days ago β’ 464
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Paper β’ 2504.02587 β’ Published Apr 3 β’ 30
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 125
Llama 3.2 Collection Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. β’ 27 items β’ Updated 13 days ago β’ 63
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper β’ 2503.12797 β’ Published Mar 17 β’ 30
Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation Paper β’ 2503.13070 β’ Published Mar 17 β’ 9
Gemma 3 Collection All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. β’ 50 items β’ Updated 11 days ago β’ 60
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper β’ 2502.04328 β’ Published Feb 6 β’ 30
view article Article LeRobot goes to driving school: Worldβs largest open-source self-driving dataset Mar 11 β’ 79
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 β’ 412