File size: 1,580 Bytes

0e7e92f
 
 
 
 
6ff5c46
0e7e92f
 
 
6ff5c46
0e7e92f
c89dcfd
0e7e92f
6ff5c46
0e7e92f
 
 
 
 
 
 
 
 
 
6ff5c46
 
 
 
0e7e92f
 
 
 
 
 
6ff5c46
 
0e7e92f
6ff5c46
 
d651c4b
c89dcfd
 
 
 
 
 
 
6ff5c46
0e7e92f

---
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
- DDPO
inference: true
---

# Aligned Diffusion Model via DDPO

Diffusion model aligned with the following reward models and Denoising Diffusion Policy Optimization (DDPO) algorithm
```
close-sourced vlm: claude3-opus  gpt-4o  gpt-4v
```

## How to Use

You can load the model and perform inference as follows:
```python
from diffusers import StableDiffusionPipeline, UNet2DConditionModel

pretrained_model_name = "runwayml/stable-diffusion-v1-5"
pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name, torch_dtype=torch.float16)

lora_path = os.path.join(""path/to/checkpoint"")
pipeline.sd_pipeline.load_lora_weights(lora_path)
pipeline.sd_pipeline.to("cuda")

generator = torch.Generator(device='cuda')
generator = generator.manual_seed(1)

prompt = "a pink flower"

image = pipeline(prompt=prompt, generator=generator, guidance_scale=5).images[0]
```

## Citation
```
@misc{chen2024mjbenchmultimodalrewardmodel,
    title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?}, 
    author={Zhaorun Chen and Yichao Du and Zichen Wen and Yiyang Zhou and Chenhang Cui and Zhenzhen Weng and Haoqin Tu and Chaoqi Wang and Zhengwei Tong and Qinglan Huang and Canyu Chen and Qinghao Ye and Zhihong Zhu and Yuqing Zhang and Jiawei Zhou and Zhuokai Zhao and Rafael Rafailov and Chelsea Finn and Huaxiu Yao},
    year={2024},
    eprint={2407.04842},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2407.04842}, 
}
```