arXiv:2407.15762
Kaiwen Wang
kaiwenw
AI & ML interests
Reinforcement Learning
Organizations
models
36
kaiwenw/single_node_run2-step-12170
2B
•
Updated
•
3
kaiwenw/single_node_run2-step-12150
2B
•
Updated
•
3
kaiwenw/single_node_run2-step-11664
2B
•
Updated
•
4
kaiwenw/single_node_run2-step-11178
2B
•
Updated
•
4
kaiwenw/single_node_run2-step-10692
2B
•
Updated
•
3
kaiwenw/single_node_run2-step-10206
2B
•
Updated
•
3
kaiwenw/single_node_run2-step-9720
2B
•
Updated
•
4
kaiwenw/single_node_run2-step-9234
2B
•
Updated
•
2
kaiwenw/single_node_run2-step-8748
2B
•
Updated
•
4
kaiwenw/single_node_run2-step-8262
2B
•
Updated
•
8
datasets
220
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-25-4096-with-bt-model-with-sigmoid
Viewer
•
Updated
•
123k
•
33
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-24-4096-with-bt-model-with-sigmoid
Viewer
•
Updated
•
123k
•
286
kaiwenw/distill-r1-qwen-1.5b-aime-25-4096-with-bt-model-with-sigmoid
Viewer
•
Updated
•
123k
•
5
kaiwenw/distill-r1-qwen-1.5b-aime-24-4096-with-bt-model-with-sigmoid
Viewer
•
Updated
•
123k
•
7
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-25-4096-with-bt-model-wout-sigmoid
Viewer
•
Updated
•
123k
•
45
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-24-4096-with-bt-model-wout-sigmoid
Viewer
•
Updated
•
123k
•
53
kaiwenw/distill-r1-qwen-1.5b-aime-25-4096-with-bt-model-wout-sigmoid
Viewer
•
Updated
•
123k
•
233
kaiwenw/distill-r1-qwen-1.5b-aime-24-4096-with-bt-model-wout-sigmoid
Viewer
•
Updated
•
123k
•
165
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-25-4096-with-old-prm-indices_61440_69120
Viewer
•
Updated
•
7.68k
•
5
kaiwenw/distill-r1-qwen-1.5b-hmmt-feb-25-4096-with-old-prm-indices_76800_84480
Viewer
•
Updated
•
7.68k
•
6