66 21 25

Miquel Farré

mfarre

AI & ML interests

I like everything video

Recent Activity

upvoted an article 9 days ago

Welcome GPT OSS, the new open-source model family from OpenAI!

new activity 15 days ago

tencent/HunyuanWorld-1:demo to test the model

upvoted an article 16 days ago

SmolLM3: smol, multilingual, long-context reasoner

View all activity

Organizations

upvoted an article 9 days ago

Article

Welcome GPT OSS, the new open-source model family from OpenAI!

and 11 others •

10 days ago

• 453

upvoted an article 16 days ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

Jul 8

• 625

upvoted an article 22 days ago

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

and 3 others •

23 days ago

• 36

upvoted a paper about 1 month ago

Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens

Paper • 2506.17218 • Published Jun 20 • 27

upvoted a paper about 2 months ago

Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance

Paper • 2502.06145 • Published Feb 10 • 17

upvoted 2 articles 3 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

and 6 others •

May 21

• 204

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 503

upvoted an article 4 months ago

Article

Cohere on Hugging Face Inference Providers 🔥

and 6 others •

Apr 16

• 131

upvoted a paper 4 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 197

upvoted an article 6 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

and 6 others •

Feb 20

• 293

upvoted a collection 6 months ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated May 5 • 95

upvoted 2 articles 7 months ago

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

and 2 others •

Jan 23

• 182

Article

Announcing NVIDIA Cosmos World Foundation Models

and 1 other •

Jan 7

• 26

upvoted a paper 8 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

upvoted a paper 10 months ago

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published Oct 22, 2024 • 30

upvoted 2 articles 11 months ago

Article

FineVideo: behind the scenes

and 5 others •

Sep 23, 2024

• 34

Article

Docmatix - a huge dataset for Document Visual Question Answering

and 1 other •

Jul 18, 2024

• 76

upvoted an article 12 months ago

Article

Scaling robotics datasets with video encoding

and 2 others •

Aug 27, 2024

• 40

upvoted a paper 12 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 132

Miquel Farré

AI & ML interests

Recent Activity

Organizations

mfarre's activity

Welcome GPT OSS, the new open-source model family from OpenAI!

SmolLM3: smol, multilingual, long-context reasoner

TimeScope: How Long Can Your Video Large Multimodal Model Go?

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Vision Language Models (Better, Faster, Stronger)

Cohere on Hugging Face Inference Providers 🔥

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Announcing NVIDIA Cosmos World Foundation Models

FineVideo: behind the scenes

Docmatix - a huge dataset for Document Visual Question Answering

Scaling robotics datasets with video encoding