Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
yifanzhang114
AI & ML interests
Yi-Fan Zhang presently is a forth-year PhD student at the State Key Laboratory of Pattern Recognition, University of Chinese Academy of Sciences, under the esteemed guidance of Prof. Tieniu Tan, is dedicated to spearheading robust and reliable deep learning systems and large pretrained models.
Recent Activity
updated
a model
7 days ago
yifanzhang114/R1-Reward
updated
a dataset
7 days ago
yifanzhang114/R1-Reward-RL
liked
a model
10 days ago
yifanzhang114/R1-Reward
Organizations
None yet
Collections
4
The Next Step Forward in Multimodal LLM Alignment
-
yifanzhang114/MM-RLHF
Viewer • Updated • 16.3k • 302 • 10 -
yifanzhang114/MM-RLHF-RewardBench
Viewer • Updated • 170 • 113 • 2 -
yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen
Image-Text-to-Text • Updated • 71 • 1 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 35
models
6

yifanzhang114/R1-Reward
Updated
•
92
•
3

yifanzhang114/MM-RLHF-Reward-7B-llava-ov-qwen
Image-Text-to-Text
•
Updated
•
71
•
1

yifanzhang114/SliME-Llama3-8B
Image-Text-to-Text
•
Updated
•
35
•
3

yifanzhang114/SliME-vicuna-7B
Image-Text-to-Text
•
Updated
•
46
•
2

yifanzhang114/SliME-Llama3-8B-lora
Image-Text-to-Text
•
Updated
•
7

yifanzhang114/SliME-vicuna-13B
Image-Text-to-Text
•
Updated
•
46
•
2
datasets
11
yifanzhang114/R1-Reward-RL
Viewer
•
Updated
•
17.3k
•
189
•
1
yifanzhang114/MM-RLHF
Viewer
•
Updated
•
16.3k
•
302
•
10
yifanzhang114/MM-RLHF-RewardBench
Viewer
•
Updated
•
170
•
113
•
2
yifanzhang114/MME-RealWorld-Base64
Viewer
•
Updated
•
11.5k
•
489
•
1
yifanzhang114/MME-RealWorld-Lite
Preview
•
Updated
•
46
•
3
yifanzhang114/MME-RealWorld-lite-lmms-eval
Viewer
•
Updated
•
1.92k
•
759
•
1
yifanzhang114/MME-RealWorld
Preview
•
Updated
•
661
•
15
yifanzhang114/AMBER_base64
Viewer
•
Updated
•
14.2k
•
45
yifanzhang114/MME-RealWorld-Lmms-eval
Viewer
•
Updated
•
23.1k
•
564
•
1
yifanzhang114/MME-RealWorld-CN-Lmms-eval
Viewer
•
Updated
•
5.89k
•
48
•
1