SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 1
SiliangZ/RM_Zephyr_dpo_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 5
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e7 Text Classification • 7B • Updated Jan 19, 2025 • 3
SiliangZ/RM_Mistral_sft_init_ultrafeedbck_lr_5e6 Text Classification • 7B • Updated Jan 19, 2025 • 9
SiliangZ/RM_mistral_7b_sft_beta_ultrachat_200k_mistral_sft_temp07_lr_5e7 7B • Updated Dec 1, 2024 • 2
SiliangZ/mistral-7b-sft-beta-rm-mistral-sft-temp07-lr-5e7-iter1 Text Generation • 7B • Updated Dec 1, 2024 • 4
SiliangZ/IRL_Iter0_RM_ultrachat_200k_vs_sft_with_spin_iter0_checkpoint_232 Text Generation • 7B • Updated Sep 11, 2024
SiliangZ/IRL_Iter0_Policy_Epoch5_RM_Data_SPIN_Iter0 Text Generation • 7B • Updated Sep 8, 2024 • 1