SahilCarterr commited on
Commit
6ec6b6f
·
verified ·
1 Parent(s): 6db8328

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ frameworks:
3
+ - Pytorch
4
+ tasks:
5
+ - text-to-image-synthesis
6
+
7
+ #model-type:
8
+ ##如 gpt、phi、llama、chatglm、baichuan 等
9
+ #- gpt
10
+
11
+ #domain:
12
+ ##如 nlp、cv、audio、multi-modal
13
+ #- nlp
14
+
15
+ #language:
16
+ ##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
17
+ #- cn
18
+
19
+ #metrics:
20
+ ##如 CIDEr、Blue、ROUGE 等
21
+ #- CIDEr
22
+
23
+ #tags:
24
+ ##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
25
+ #- pretrained
26
+
27
+ #tools:
28
+ ##如 vllm、fastchat、llamacpp、AdaSeq 等
29
+ #- vllm
30
+ base_model:
31
+ - Qwen/Qwen-Image
32
+ base_model_relation: adapter
33
+ ---
34
+ # Qwen-Image Image Structure Control Model
35
+
36
+ ![](./assets/title.png)
37
+
38
+ ## Model Introduction
39
+
40
+ This model is a structure control model for images, trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image). The model architecture is ControlNet, capable of controlling the generated image structure according to edge detection (Canny) maps. The training framework is built upon [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) and the dataset used is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。
41
+
42
+
43
+ ## Effect Demonstration
44
+
45
+ |Structure Map|Generated Image 1|Generated Image 2|
46
+ |-|-|-|
47
+ |![](./assets/canny_3.png)|![](./assets/image_3_1.png)|![](./assets/image_3_2.png)|
48
+ |![](./assets/canny_2.png)|![](./assets/image_2_1.png)|![](./assets/image_2_2.png)|
49
+ |![](./assets/canny_1.png)|![](./assets/image_1_1.png)|![](./assets/image_1_2.png)|
50
+
51
+ ## Inference Code
52
+ ```
53
+ git clone https://github.com/modelscope/DiffSynth-Studio.git
54
+ cd DiffSynth-Studio
55
+ pip install -e .
56
+ ```
57
+
58
+ ```python
59
+ from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
60
+ from PIL import Image
61
+ import torch
62
+ from modelscope import dataset_snapshot_download
63
+
64
+
65
+ pipe = QwenImagePipeline.from_pretrained(
66
+ torch_dtype=torch.bfloat16,
67
+ device="cuda",
68
+ model_configs=[
69
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
70
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
71
+ ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
72
+ ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Canny", origin_file_pattern="model.safetensors"),
73
+ ],
74
+ tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
75
+ )
76
+
77
+ dataset_snapshot_download(
78
+ dataset_id="DiffSynth-Studio/example_image_dataset",
79
+ local_dir="./data/example_image_dataset",
80
+ allow_file_pattern="canny/image_1.jpg"
81
+ )
82
+ controlnet_image = Image.open("data/example_image_dataset/canny/image_1.jpg").resize((1328, 1328))
83
+
84
+ prompt = "A puppy with shiny, smooth fur and lively eyes, with a spring garden full of cherry blossoms as the background, beautiful and warm."
85
+ image = pipe(
86
+ prompt, seed=0,
87
+ blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
88
+ )
89
+ image.save("image.jpg")
90
+ ```
91
+
92
+ ---
93
+ license: apache-2.0
94
+ ---