1 4

Zhizhou Sha

JameSand

AI & ML interests

None yet

Recent Activity

updated a model 7 days ago

JameSand/llama-adamw-lr1e-6-20260110_015014-global_step_200

published a model 7 days ago

JameSand/llama-adamw-lr1e-6-20260110_015014-global_step_200

updated a model 7 days ago

JameSand/llama-sgd-lr1e-2-20260110_020449-global_step_200

View all activity

Organizations

updated a model 7 days ago

JameSand/llama-adamw-lr1e-6-20260110_015014-global_step_200

4B • Updated 7 days ago • 8

published a model 7 days ago

JameSand/llama-adamw-lr1e-6-20260110_015014-global_step_200

4B • Updated 7 days ago • 8

updated a model 7 days ago

JameSand/llama-sgd-lr1e-2-20260110_020449-global_step_200

4B • Updated 7 days ago • 8

published a model 7 days ago

JameSand/llama-sgd-lr1e-2-20260110_020449-global_step_200

4B • Updated 7 days ago • 8

updated a model 7 days ago

JameSand/llama-muon-muonlr1e-4-spectral_norm-muonadamlr1e-6-20260110_005142-global_step_200

4B • Updated 7 days ago • 7

published a model 7 days ago

JameSand/llama-muon-muonlr1e-4-spectral_norm-muonadamlr1e-6-20260110_005142-global_step_200

4B • Updated 7 days ago • 7

updated a model 17 days ago

JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrNone-iter_0000200

Text Generation • 3B • Updated 17 days ago • 10

published a model 17 days ago

JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrNone-iter_0000200

Text Generation • 3B • Updated 17 days ago • 10

updated a model 17 days ago

JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrrms_norm-iter_0000200

Text Generation • 3B • Updated 17 days ago • 10

published a model 17 days ago

JameSand/Llama-3.2-3B-Instruct-muon-2e-2-muonadamlr1e-6-muonadjustlrrms_norm-iter_0000200

Text Generation • 3B • Updated 17 days ago • 10

commented on 🚀 Journey to Reproduce **Search-R1** 17 days ago

Hi Seungyoun! Thank you for the nice blog.

I am also looking forward to your training scripts.

I am also have problems for reproducing the results of Search-R1

Best,
James

updated a dataset about 1 month ago

JameSand/star-graph-deg-128-path-3-nodes-300

Viewer • Updated Dec 9, 2025 • 6k • 2

published a dataset about 1 month ago

JameSand/star-graph-deg-128-path-3-nodes-300

Viewer • Updated Dec 9, 2025 • 6k • 2

updated a dataset about 1 month ago

JameSand/star-graph-deg-64-path-3-nodes-200

Viewer • Updated Dec 9, 2025 • 6k • 1

published a dataset about 1 month ago

JameSand/star-graph-deg-64-path-3-nodes-200

Viewer • Updated Dec 9, 2025 • 6k • 1

updated a dataset about 1 month ago

JameSand/star-graph-deg-32-path-2-nodes-100

Viewer • Updated Dec 9, 2025 • 6k • 1

published a dataset about 1 month ago

JameSand/star-graph-deg-32-path-2-nodes-100

Viewer • Updated Dec 9, 2025 • 6k • 1

upvoted a paper about 2 months ago

Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following

Paper • 2511.21662 • Published Nov 26, 2025 • 11

commented a paper 2 months ago

T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models

Paper • 2504.04718 • Published Apr 7, 2025 • 42 •

updated a model 2 months ago

JameSand/Llama-BF16-math-step200

4B • Updated Nov 16, 2025

Zhizhou Sha

AI & ML interests

Recent Activity

Organizations

JameSand's activity