metadata

license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: text-to-image

DiffBlender Model Card

This repo contains the models from our paper DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.

Model details

Model type: DiffBlender successfully synthesizes complex combinations of input modalities. It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences. We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks.

We provide its model checkpoint, trained with six modalities: sketch, depth map, grounding box, keypoints, color palette, and style embedding. >> ./checkpoint_latest.pth

License: Apache 2.0 License

Where to send questions or comments about the model: https://github.com/sungnyun/diffblender/issues

Training dataset

Microsoft COCO 2017 dataset

More detials are in our project page, https://sungnyun.github.io/diffblender/.