Models in Adaptive Length Penalty Paper
AI & ML interests
None defined yet.
models
41
RLAIF/dpo_answer_reddit_offtheshelf_extra_1e-6_0.02_4B_4B
Updated
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_8B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_position_bias_1e-6_0.02_8B_4B
Updated
RLAIF/dpo_thinking_reddit_judge_position_bias_1e-6_0.02_4B_0.6B
Updated
RLAIF/dpo_thinking_reddit_judge_position_bias_cot_1e-6_0.02_4B_1.7B
Updated
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_0.6B
Updated
RLAIF/grpo_reddit_judge_position_bias_16_64_8_3e-5_1e-6_0.6B
Updated
RLAIF/grpo_reddit_judge_position_bias_16_64_8_3e-5_1e-6_1.7B
Updated
RLAIF/dpo_thinking_reddit_judge_position_bias_1e-6_0.02_1.7B_4B
Updated
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_1.7B_4B
Updated
datasets
130
RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
40
RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
8k
•
40
RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
41
RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
47
RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
49
RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
55
RLAIF/dpo_answer_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
27k
•
59
RLAIF/WritingPrompts-Filtered
Viewer
•
Updated
•
199k
•
70
RLAIF/WritingPrompts_preferences_chris_filtered
Viewer
•
Updated
•
199k
•
60
RLAIF/dpo_thinking_n_a_o_h_u_p_corrected_2048_v2_1e-6_0.02_1.7B_4B_with_gold_labels_kl_estimation
Viewer
•
Updated
•
47.7k
•
72