-
spiral-rl/Spiral-Qwen3-4B
Text Generation • 4B • Updated • 45 • 4 -
spiral-rl/Spiral-DeepSeek-R1-Distill-Qwen-7B
Text Generation • 8B • Updated • 36 • 2 -
spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
Viewer • Updated • 25.5k • 95 -
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Paper • 2506.24119 • Published • 47
AI & ML interests
None defined yet.
Recent Activity
View all activity
-
spiral-rl/Spiral-Qwen3-4B
Text Generation • 4B • Updated • 45 • 4 -
spiral-rl/Spiral-DeepSeek-R1-Distill-Qwen-7B
Text Generation • 8B • Updated • 36 • 2 -
spiral-rl/Spiral-Kuhn-Poker-Qwen3-32B-SFT
Viewer • Updated • 25.5k • 95 -
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Paper • 2506.24119 • Published • 47