Mohammed Mohammed Ali

MohammedEltoum

AI & ML interests

None yet

Recent Activity

upvoted an article 1 day ago

Vision Language Model Alignment in TRL ⚡️

upvoted a paper 3 days ago

MolmoAct: Action Reasoning Models that can Reason in Space

upvoted a paper 6 days ago

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

View all activity

Organizations

upvoted an article 1 day ago

Article

Vision Language Model Alignment in TRL ⚡️

and 4 others •

8 days ago

• 47

upvoted a paper 3 days ago

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published 3 days ago • 33

upvoted a paper 6 days ago

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

Paper • 2507.23404 • Published 14 days ago • 2

upvoted a paper 29 days ago

Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers

Paper • 2507.10787 • Published about 1 month ago • 11

upvoted a paper about 2 months ago

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models

Paper • 2506.19851 • Published Jun 24 • 58

upvoted an article 2 months ago

Article

How to Build an MCP Server with Gradio

and 1 other •

Apr 30

• 189

upvoted 2 papers 3 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 274

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97

upvoted an article 3 months ago

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 503

upvoted 6 papers 3 months ago

Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Paper • 2505.04769 • Published May 7 • 8

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 184

FG-CLIP: Fine-Grained Visual and Textual Alignment

Paper • 2505.05071 • Published May 8 • 18

Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities

Paper • 2505.01043 • Published May 2 • 10

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published May 3 • 39

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5 • 86

upvoted 2 papers 4 months ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197

upvoted 2 papers 5 months ago

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Paper • 2503.17352 • Published Mar 21 • 24

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published Mar 17 • 96

upvoted a collection 6 months ago

olmOCR

Collection

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 6 items • Updated 22 days ago • 127

Mohammed Mohammed Ali

AI & ML interests

Recent Activity

Organizations

MohammedEltoum's activity

Vision Language Model Alignment in TRL ⚡️

How to Build an MCP Server with Gradio

Vision Language Models (Better, Faster, Stronger)