CodeGoat24 nielsr HF Staff commited on
Commit
940c93a
Β·
verified Β·
1 Parent(s): 40b5f3c

Add pipeline tag, abstract, code link, and relevant images (#1)

Browse files

- Add pipeline tag, abstract, code link, and relevant images (1778272aae6d07bc178ca3d3a1a491b576b6767e)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +22 -12
README.md CHANGED
@@ -1,32 +1,43 @@
1
  ---
2
- library_name: diffusers
3
- license: mit
4
  base_model:
5
  - black-forest-labs/FLUX.1-dev
 
 
 
6
  ---
7
 
 
8
 
 
9
 
 
 
10
 
11
- ## FLUX.1-dev-PrefGRPO
12
-
13
- This model is trained using [Pref-GRPO](https://codegoat24.github.io/UnifiedReward/Pref-GRPO) on the training dataset of [UniGenBench](https://github.com/CodeGoat24/UniGenBench).
 
 
14
 
 
 
 
 
15
 
16
  For further details, please refer to the following resources:
17
  - πŸ“° Paper: https://arxiv.org/pdf/2508.20751
18
  - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/Pref-GRPO
 
19
  - πŸ€— UniGenBench: https://github.com/CodeGoat24/UniGenBench
20
  - πŸ€— Leaderboard: https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard
21
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
22
 
23
-
24
  ### Quick Start
25
- ~~~python
26
  pip install -U diffusers
27
- ~~~
28
 
29
- ~~~python
30
  import torch
31
  from diffusers import FluxPipeline
32
 
@@ -45,12 +56,11 @@ image = pipe(
45
  ).images[0]
46
  image.save("flux-dev.png")
47
 
48
- ~~~
49
-
50
 
51
  ## Citation
52
 
53
- ```
54
  @article{Pref-GRPO&UniGenBench,
55
  title={Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning},
56
  author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Zhou, Yujie and Bu, Jiazi and Wang, Chunyu and Lu, Qinglin, and Jin, Cheng and Wang, Jiaqi},
 
1
  ---
 
 
2
  base_model:
3
  - black-forest-labs/FLUX.1-dev
4
+ library_name: diffusers
5
+ license: mit
6
+ pipeline_tag: text-to-image
7
  ---
8
 
9
+ # FLUX.1-dev-PrefGRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
10
 
11
+ This model is trained using [Pref-GRPO](https://codegoat24.github.io/UnifiedReward/Pref-GRPO) on the training dataset of [UniGenBench](https://github.com/CodeGoat24/UniGenBench).
12
 
13
+ ## Abstract
14
+ Recent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods using pointwise reward models (RM) for scoring generated images are susceptible to reward hacking. We reveal that this happens when minimal score differences between images are amplified after normalization, creating illusory advantages that drive the model to over-optimize for trivial gains, ultimately destabilizing the image generation process. To address this, we propose Pref-GRPO, a pairwise preference reward-based GRPO method that shifts the optimization objective from score maximization to preference fitting, ensuring more stable training. In Pref-GRPO, images are pairwise compared within each group using preference RM, and the win rate is used as the reward signal. Extensive experiments demonstrate that PREF-GRPO differentiates subtle image quality differences, providing more stable advantages and mitigating reward hacking. Additionally, existing T2I benchmarks are limited by coarse evaluation criteria, hindering comprehensive model assessment. To solve this, we introduce UniGenBench, a unified T2I benchmark comprising 600 prompts across 5 main themes and 20 subthemes. It evaluates semantic consistency through 10 primary and 27 sub-criteria, leveraging MLLM for benchmark construction and evaluation. Our benchmarks uncover the strengths and weaknesses of both open and closed-source T2I models and validate the effectiveness of Pref-GRPO.
15
 
16
+ ## Model Overview
17
+ Here are the main components of the Pref-GRPO pipeline:
18
+ <div align="center">
19
+ <img src="https://github.com/CodeGoat24/Pref-GRPO/raw/main/assets/pref_grpo_pipeline.png" alt="Pref-GRPO Pipeline" width="100%"/>
20
+ </div>
21
 
22
+ Pref-GRPO addresses the issue of reward hacking in text-to-image generation:
23
+ <div align="center">
24
+ <img src="https://github.com/CodeGoat24/Pref-GRPO/raw/main/assets/pref_grpo_reward_hacking.png" alt="Pref-GRPO Reward Hacking" width="100%"/>
25
+ </div>
26
 
27
  For further details, please refer to the following resources:
28
  - πŸ“° Paper: https://arxiv.org/pdf/2508.20751
29
  - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/Pref-GRPO
30
+ - πŸ’» Code: https://github.com/CodeGoat24/Pref-GRPO
31
  - πŸ€— UniGenBench: https://github.com/CodeGoat24/UniGenBench
32
  - πŸ€— Leaderboard: https://huggingface.co/spaces/CodeGoat24/UniGenBench_Leaderboard
33
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
34
 
 
35
  ### Quick Start
36
+ ```python
37
  pip install -U diffusers
38
+ ```
39
 
40
+ ```python
41
  import torch
42
  from diffusers import FluxPipeline
43
 
 
56
  ).images[0]
57
  image.save("flux-dev.png")
58
 
59
+ ```
 
60
 
61
  ## Citation
62
 
63
+ ```bibtex
64
  @article{Pref-GRPO&UniGenBench,
65
  title={Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning},
66
  author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Zhou, Yujie and Bu, Jiazi and Wang, Chunyu and Lu, Qinglin, and Jin, Cheng and Wang, Jiaqi},