7 18 26

Manan Shah

cs-mshah

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

upvoted an article 28 days ago

Efficient MultiModal Data Pipeline

upvoted an article about 2 months ago

GRPO for GUI Grounding Done Right

liked a model 2 months ago

TheDenk/Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1

View all activity

Organizations

upvoted an article 28 days ago

Article

Efficient MultiModal Data Pipeline

and 4 others •

about 1 month ago

• 53

upvoted an article about 2 months ago

Article

GRPO for GUI Grounding Done Right

•

Jun 11

• 30

liked a model 2 months ago

TheDenk/Qwen2.5-VL-3B-TrackAnyObject-LoRa-v1

Image-Text-to-Text • Updated Apr 26 • 6

upvoted a paper 3 months ago

LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published May 14 • 34

upvoted a collection 3 months ago

ViRFT Datasets

Collection

ViRFT Datasets • 8 items • Updated Feb 24 • 9

liked 4 datasets 3 months ago

liked a model 3 months ago

remyxai/SpaceThinker-Qwen2.5VL-3B

Image-Text-to-Text • 4B • Updated 5 days ago • 8.58k • 26

upvoted a paper 3 months ago

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

Paper • 2504.06958 • Published Apr 9 • 11

liked a model 3 months ago

facebook/Perception-LM-8B

Image-Text-to-Text • 10B • Updated 23 days ago • 748 • 52

liked a dataset 3 months ago

nvidia/dynpose-100k

Updated May 12 • 675 • 40

upvoted 2 papers 3 months ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published Apr 22 • 38

Vidi: Large Multimodal Models for Video Understanding and Editing

Paper • 2504.15681 • Published Apr 22 • 15

upvoted 2 papers 4 months ago

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Paper • 2504.07615 • Published Apr 10 • 32

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197

liked 2 models 4 months ago

reducto/RolmOCR

Image-to-Text • 8B • Updated Apr 2 • 198k • 468

ds4sd/SmolDocling-256M-preview

Image-Text-to-Text • 0.3B • Updated May 16 • 56.6k • 1.52k

updated a dataset 4 months ago

cs-mshah/SynMirror

Preview • Updated Mar 29 • 43 • 1

Manan Shah

AI & ML interests

Recent Activity

Organizations

cs-mshah's activity

Efficient MultiModal Data Pipeline

GRPO for GUI Grounding Done Right