-
microsoft/OmniParser
Image-Text-to-Text • Updated • 410 • 1.7k -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 28 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
ModernVBERT: Towards Smaller Visual Document Retrievers
Paper • 2510.01149 • Published • 30
Frank Sommers PRO
fsommers
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
23 days ago
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
upvoted
an
article
24 days ago
Transformers v5: Simple model definitions powering the AI ecosystem
upvoted
a
paper
about 1 month ago
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models