Training data and finetuned checkpoints for Reinforce-Ada
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
models
29

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
8B
•
Updated
•
2
•
1

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation
•
8B
•
Updated
•
1

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
8B
•
Updated
•
3.25k
•
•
36

RLHFlow/Qwen2.5-7B-SFT
8B
•
Updated
•
1

RLHFlow/Qwen2.5-7B-RAFT-Zero
8B
•
Updated
•
1

RLHFlow/Qwen2.5-7B-DPO-NLL-Zero
8B
•
Updated
•
1

RLHFlow/Qwen2.5-7B-DPO-Zero
8B
•
Updated

RLHFlow/Qwen2.5-7B-DPO
8B
•
Updated
•
3

RLHFlow/Qwen2.5-7B-PPO-Zero
8B
•
Updated
•
11
•
2

RLHFlow/Decision-Tree-Reward-Gemma-2-27B
Text Classification
•
27B
•
Updated
•
31
•
7
datasets
85
RLHFlow/reinforce_ada_easy_prompt
Viewer
•
Updated
•
24.3k
RLHFlow/reinforce_ada_hard_prompt
Viewer
•
Updated
•
15.7k
•
46
RLHFlow/self_rewarding_turn2_example
Updated
•
5
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
•
10
RLHFlow/self_rewarding_rl_prompt
Updated
•
10
RLHFlow/self_rewarding_sft_prompt
Viewer
•
Updated
•
40k
•
8
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
•
16.3k
•
7
RLHFlow/self_rewarding_ift_example
Viewer
•
Updated
•
32k
•
18
RLHFlow/qwq_gen_sft_15k
Viewer
•
Updated
•
15k
•
9
RLHFlow/numia_prompt_ppo
Viewer
•
Updated
•
404k
•
18
•
1