1 16 6

Mao Song

MaoSong2022

https://maosong2022.github.io/

MaoSong2022

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

Tool Use, Unified

upvoted an article 3 days ago

Open-R1: a fully open reproduction of DeepSeek-R1

liked a Space 29 days ago

HuggingFaceTB/smol-training-playbook

View all activity

Organizations

upvoted 2 articles 3 days ago

Article

Tool Use, Unified

Rocketknight1

•

Aug 12, 2024

• 121

Article

Open-R1: a fully open reproduction of DeepSeek-R1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

liked a Space 29 days ago

The Smol Training Playbook

📚

3.17k

The secrets to building world-class LLMs

liked a Space about 1 month ago

Bringing paper to life: A modern template for scientific writing

📝

Explore a scientific article with interactive visualizations

upvoted an article about 2 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 152

upvoted a collection 3 months ago

Finetuned Eagle Models

Collection

[ICLR 2026] Official Implementation of paper 'Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders' • 3 items • Updated Feb 13 • 1

upvoted 2 articles 3 months ago

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

MiniMax-AI

•

Oct 30, 2025

• 43

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

MiniMax-AI

•

Oct 30, 2025

• 80

upvoted an article 4 months ago

Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

AviSoori1x

•

Jun 23, 2024

• 39

upvoted a paper 4 months ago

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 63

liked a Space 5 months ago

Evaluation Guidebook

📝

317

Explore LLM benchmark trends over time

upvoted a collection 6 months ago

Olmo 3

Collection

Artifacts for the Olmo 3 release. • 7 items • Updated Mar 2 • 169

upvoted an article 10 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 775

liked a Space 11 months ago

The Ultra-Scale Playbook

🌌

3.84k

The ultimate guide to training LLM on large GPU Clusters

upvoted an article 11 months ago

Article

You could have designed state of the art positional encoding

FL33TW00D-HF

•

Nov 25, 2024

• 478

upvoted an article 12 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb

•

May 21, 2025

• 258

updated a dataset about 1 year ago

MaoSong2022/CharXiv

Updated Apr 30, 2025 • 32

published a dataset about 1 year ago

MaoSong2022/CharXiv

Updated Apr 30, 2025 • 32

updated a dataset about 1 year ago

MaoSong2022/CV-Bench

Preview • Updated Apr 14, 2025 • 18

published a dataset about 1 year ago

MaoSong2022/CV-Bench

Preview • Updated Apr 14, 2025 • 18

Mao Song

AI & ML interests

Recent Activity

Organizations

MaoSong2022's activity

Tool Use, Unified

Open-R1: a fully open reproduction of DeepSeek-R1

The Smol Training Playbook

Bringing paper to life: A modern template for scientific writing

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

Aligning to What? Rethinking Agent Generalization in MiniMax M2

Why Did MiniMax M2 End Up as a Full Attention Model?

SeeMoE: Implementing a MoE Vision Language Model from Scratch

Evaluation Guidebook

SmolLM3: smol, multilingual, long-context reasoner

The Ultra-Scale Playbook

You could have designed state of the art positional encoding

nanoVLM: The simplest repository to train your VLM in pure PyTorch