RLAIF

Team

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

AngelRaychev updated a dataset 18 days ago

RLAIF/webgpt

AngelRaychev published a dataset 18 days ago

RLAIF/webgpt

AngelRaychev updated a dataset 18 days ago

RLAIF/tldr

View all activity

Collections 2

models 80

datasets 134

RLAIF/webgpt

Viewer • Updated 18 days ago • 13.3k • 55

RLAIF/tldr

Viewer • Updated 18 days ago • 92.9k • 19

RLAIF/ultrafeedback-binarized

Viewer • Updated 19 days ago • 63.5k • 26

RLAIF/gm_toy_example

Viewer • Updated Nov 1 • 1.1k • 10

RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 15 • 27k • 15

RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 15 • 8k • 15

RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14 • 27k • 11

RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14 • 27k • 19

RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14 • 27k • 12

RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

Viewer • Updated Sep 14 • 27k • 16

View 134 datasets

RLAIF

AI & ML interests

Recent Activity

Collections 2

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

SynthLabsAI/ALP_R1_Qwen1.5B

RLAIF/CODE-BEHAVIOR-NUMINA-V1-Blocks

models 80

RLAIF/twitter_8EUB__5e-06_0.1_20_0.9_20_0.95

RLAIF/dpo_thinking_reddit_judge_last_minute_50_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_150_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_100_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_200_1e-6_0.02_4B_4B

RLAIF/dpo_thinking_reddit_judge_last_minute_250_1e-6_0.02_4B_4B

RLAIF/grpo_reddit_judge_last_minute_16_64_8_3e-5_1e-6_4B

RLAIF/dpo_thinking_reddit_judge_full_1e-6_0.02_8B_4B

RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_4B_1.7B

RLAIF/dpo_answer_reddit_judge_full_1e-6_0.02_8B_4B

datasets 134

RLAIF/webgpt

RLAIF/tldr

RLAIF/ultrafeedback-binarized

RLAIF/gm_toy_example

RLAIF/dpo_thinking_reddit_judge4_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge3_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge2_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_thinking_reddit_offtheshelf_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

RLAIF/dpo_answer_reddit_judge_1e-6_0.02_4B_4B_with_gold_labels_kl_estimation

AI & ML interests

Recent Activity

Team members 9

Collections 2

models 80 Sort: Recently updated

datasets 134 Sort: Recently updated

models 80

datasets 134