File size: 1,159 Bytes
387cd44
 
c772c9e
 
 
 
387cd44
c772c9e
 
 
 
 
 
 
 
 
 
 
 
 
 
0def3a7
 
c772c9e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-to-image
---

<br>

# DiffBlender Model Card

This repo contains the models from our paper [**DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models**](https://arxiv.org/abs/2305.15194).


## Model details

**Model type:**
DiffBlender successfully synthesizes complex combinations of input modalities. 
It enables flexible manipulation of conditions, providing the customized generation aligned with user preferences.
We designed its structure to intuitively extend to additional modalities while achieving a low training cost through a partial update of hypernetworks.

We provide its model checkpoint, trained with six modalities: sketch, depth map, grounding box, keypoints, color palette, and style embedding. >> `./checkpoint_latest.pth`

**License:**
Apache 2.0 License

**Where to send questions or comments about the model:**
https://github.com/sungnyun/diffblender/issues


## Training dataset
[Microsoft COCO 2017 dataset](https://cocodataset.org/#home)


<br>

More detials are in our project page, https://sungnyun.github.io/diffblender/.