yichaodu commited on
Commit
915304e
·
verified ·
1 Parent(s): 17a3da3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -7,4 +7,12 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ Multimodal reward models play a pivotal role in Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). They serve as judges, providing crucial feedback to align foundation models (FMs) with desired behaviors. However, the evaluation of these multimodal judges often lacks thoroughness, leading to potential misalignment and unsafe fine-tuning outcomes.
11
+
12
+ To address this, we introduce **MJ-Bench**, a novel benchmark designed to evaluate multimodal judges using a comprehensive preference dataset. \algname assesses feedback for image generation models across four key perspectives: alignment, safety, image quality, and bias.
13
+
14
+ We evaluate a wide range of multimodal judges, including:
15
+
16
+ - Scoring models
17
+ - Open-source Vision-Language Models (VLMs) such as the LLaVA family
18
+ - Closed-source VLMs like GPT-4o and Claude 3