view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • 3 days ago • 35
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • 14 days ago • 61
SmolDocling datasets Collection Datasets used to train SmolDocling • 6 items • Updated 14 days ago • 28
view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others • 22 days ago • 36
view article Article Fast LoRA inference for Flux with Diffusers and PEFT By sayakpaul and 1 other • 22 days ago • 43
view article Article Arc Virtual Cell Challenge: A Primer By FL33TW00D-HF and 1 other • 27 days ago • 51
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 624
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published Jun 20 • 27
SmolVLA Collection Small, efficient and light-weight VLAs pretrained on community datasets • 1 item • Updated Jun 1 • 27
view article Article Weekly Robotics June #1 - SmolVLA discovery and thoughts By Beegbrain • Jun 3 • 9
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • Jun 3 • 70
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data By danaaubakirova and 8 others • Jun 3 • 224
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 127
view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other • May 28 • 71
view article Article Exploring Quantization Backends in Diffusers By derekl35 and 2 others • May 21 • 39