MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 4 days ago • 33
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 5 days ago • 9
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 18 days ago • 96
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 18 days ago • 96
Learning Video Context as Interleaved Multimodal Sequences Paper • 2407.21757 • Published Jul 31, 2024
OpenScholar_V1 Collection The set of models, index, data associated with the paper "OpenScholar: Synthesizing Scientific Literature with Retrieval-Augmented LMs". • 7 items • Updated Mar 2 • 53
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 43
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 202
Olaf-World: Orienting Latent Actions for Video World Modeling Paper • 2602.10104 • Published Feb 10 • 27
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published Feb 2 • 96
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection Paper • 2601.03928 • Published Jan 7 • 17
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection Paper • 2601.03928 • Published Jan 7 • 17