4 34 93

Richard Lian

richardlian

dachenlian

AI & ML interests

None yet

Recent Activity

upvoted a paper 20 days ago

Inverse Scaling in Test-Time Compute

liked a Space 2 months ago

mteb/leaderboard

liked a Space 2 months ago

cfahlgren1/model-release-heatmap

View all activity

Organizations

upvoted a paper 20 days ago

Inverse Scaling in Test-Time Compute

Paper • 2507.14417 • Published 26 days ago • 27

upvoted an article 2 months ago

Article

KV Cache from scratch in nanoVLM

and 4 others •

Jun 4

• 89

upvoted a paper 2 months ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 177

upvoted a paper 3 months ago

Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83

upvoted 2 articles 3 months ago

Article

The Transformers Library: standardizing model definitions

and 3 others •

May 15

• 116

Article

Vision Language Models (Better, Faster, Stronger)

and 4 others •

May 12

• 503

upvoted a collection 4 months ago

Unsloth Dynamic 2.0 Quants

Collection

New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & SOTA quantization performance. • 43 items • Updated 1 day ago • 183

upvoted an article 4 months ago

Article

Introducing HELMET

and 6 others •

Apr 16

• 35

upvoted 4 articles 5 months ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

•

Mar 17

• 326

Article

Rearchitecting Hugging Face Uploads and Downloads

and 2 others •

Nov 26, 2024

• 48

Article

From Files to Chunks: Improving Hugging Face Storage Efficiency

and 1 other •

Nov 20, 2024

• 63

Article

Xet is on the Hub

and 5 others •

Mar 18

• 66

upvoted an article 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 204

upvoted 2 papers 7 months ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 116

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Paper • 2501.09686 • Published Jan 16 • 41

upvoted an article 7 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

•

Jan 15

• 204

upvoted an article 8 months ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

•

Oct 7, 2024

• 46

upvoted a collection 8 months ago

ModernBERT

Collection

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 149

upvoted 2 articles 10 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

and 5 others •

Sep 18, 2024

• 264

Article

Fixing Gradient Accumulation

and 5 others •

Oct 16, 2024

• 60

Richard Lian

AI & ML interests

Recent Activity

Organizations

richardlian's activity

KV Cache from scratch in nanoVLM

The Transformers Library: standardizing model definitions

Vision Language Models (Better, Faster, Stronger)

Introducing HELMET

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Rearchitecting Hugging Face Uploads and Downloads

From Files to Chunks: Improving Hugging Face Storage Efficiency

Xet is on the Hub

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Train 400x faster Static Embedding Models with Sentence Transformers

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Fixing Gradient Accumulation