ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated Jul 6 • 14
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Jul 6 • 19
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6 Reinforcement Learning • 1B • Updated Jul 6 • 17
mradermacher/ReForm-14B-RL-entropy-GGUF Reinforcement Learning • 15B • Updated about 1 month ago • 63
tensorblock/Nellyw888_VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF Reinforcement Learning • 7B • Updated 27 days ago • 144
mradermacher/Qwen3-14B-ARPO-DeepSearch-GGUF Reinforcement Learning • 15B • Updated 14 days ago • 3.07k • 1
mradermacher/Qwen3-14B-ARPO-DeepSearch-i1-GGUF Reinforcement Learning • 15B • Updated 14 days ago • 2.94k • 1
mradermacher/CscSQL-Merge-Qwen2.5-Coder-0.5B-Instruct-GGUF Reinforcement Learning • 0.6B • Updated 26 days ago • 188
mradermacher/CscSQL-Merge-Qwen2.5-Coder-1.5B-Instruct-GGUF Reinforcement Learning • 2B • Updated 26 days ago • 445