view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • 8 days ago • 47
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published 3 days ago • 33
Enhanced Arabic Text Retrieval with Attentive Relevance Scoring Paper • 2507.23404 • Published 14 days ago • 2
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers Paper • 2507.10787 • Published about 1 month ago • 11
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 58
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14 • 97
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 503
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Paper • 2505.04769 • Published May 7 • 8
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 184
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities Paper • 2505.01043 • Published May 2 • 10
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 39
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 86
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 300
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21 • 24
One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper • 2503.13358 • Published Mar 17 • 96
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 6 items • Updated 22 days ago • 127