47 97 458

ldwang

ftgreat

AI & ML interests

LLM, MLLM, Infra

Recent Activity

liked a dataset 1 day ago

ByteDance-Seed/mga-fineweb-edu

liked a model 2 days ago

HuggingFaceTB/SmolVLM-Instruct

liked a Space 2 days ago

smolagents/smolagents-leaderboard

View all activity

Organizations

liked a dataset 1 day ago

ByteDance-Seed/mga-fineweb-edu

Viewer • Updated May 19 • 846M • 2.28k • 29

liked a model 2 days ago

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • 2B • Updated Apr 8 • 97.8k • 531

liked a Space 2 days ago

137

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents

updated a collection 3 days ago

MiscAgentic

Collection

3 items • Updated 3 days ago • 1

upvoted a collection 3 days ago

MiscAgentic

Collection

3 items • Updated 3 days ago • 1

liked a dataset 3 days ago

smolagents/benchmark-v1

Viewer • Updated Mar 4 • 132 • 399 • 15

upvoted a paper 4 days ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 163

upvoted an article 4 days ago

Article

Introducing smolagents: simple agents that write actions in code.

and 2 others •

Dec 31, 2024

• 1.1k

authored a paper 6 days ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published 8 days ago • 14

liked a dataset 6 days ago

princeton-nlp/SWE-bench

Viewer • Updated Mar 3 • 21.5k • 16.8k • 117

liked a Space 6 days ago

123

GPT-OSS-120B on AMD MI300X

💻

gpt-oss-120b model running on AMD MI300 infrastructure.

liked a dataset 7 days ago

HuggingFaceH4/Multilingual-Thinking

Viewer • Updated 5 days ago • 1k • 7.01k • 47

reacted to JingzeShi's post with 🤗 7 days ago

Post

3957

Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗

Trainable Dynamic Mask Sparse Attention (2508.02124)

liked a model 7 days ago

openai/gpt-oss-20b

Text Generation • 22B • Updated 4 days ago • 2.37M • • 2.82k

upvoted a paper 8 days ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published 8 days ago • 14

liked a dataset 12 days ago

zai-org/CC-Bench-trajectories

Viewer • Updated 15 days ago • 208 • 4.48k • 38

liked 2 models 12 days ago

Qwen/Qwen3-Coder-480B-A35B-Instruct

Text Generation • 480B • Updated 5 days ago • 44.7k • • 1.07k

internlm/Intern-S1

Image-Text-to-Text • 241B • Updated 9 days ago • 24.5k • 177

liked a dataset 14 days ago

allenai/CoSyn-400K

Viewer • Updated Feb 28 • 408k • 2.38k • 34

upvoted a paper 18 days ago

Group Sequence Policy Optimization

Paper • 2507.18071 • Published 20 days ago • 281