|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
<br> |
|
|
|
# DiffBlender Model Card |
|
|
|
This repo contains the models from our paper [**DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models**](https://arxiv.org/abs/2305.15194). |
|
|
|
|
|
## Model details |
|
|
|
**Model type:** |
|
DiffBlender successfully synthesizes complex combinations of input modalities. |
|
It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences. |
|
We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks. |
|
|
|
We provide its model checkpoint, trained with six modalities: sketch, depth map, grounding box, keypoints, color palette, and style embedding. >> `./checkpoint_latest.pth` |
|
|
|
**License:** |
|
Apache 2.0 License |
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/sungnyun/diffblender/issues |
|
|
|
|
|
## Training dataset |
|
[Microsoft COCO 2017 dataset](https://cocodataset.org/#home) |
|
|
|
|
|
<br> |
|
|
|
More detials are in our project page, https://sungnyun.github.io/diffblender/. |