SahilCarterr
/

Qwen-Image-Blockwise-ControlNet-Depth

Model card Files Files and versions

Qwen-Image-Blockwise-ControlNet-Depth / README.md

SahilCarterr's picture

Update README.md

0d77f8d verified 8 days ago

|

history blame contribute delete

3.16 kB

	---
	frameworks:
	- Pytorch
	tasks:
	- text-to-image-synthesis

	#model-type:
	##如 gpt、phi、llama、chatglm、baichuan 等
	#- gpt

	#domain:
	##如 nlp、cv、audio、multi-modal
	#- nlp

	#language:
	##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
	#- cn

	#metrics:
	##如 CIDEr、Blue、ROUGE 等
	#- CIDEr

	#tags:
	##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
	#- pretrained

	#tools:
	##如 vllm、fastchat、llamacpp、AdaSeq 等
	#- vllm
	base_model:
	- Qwen/Qwen-Image
	base_model_relation: adapter
	---
	# Qwen-Image Image Structure Control Model - Depth ControlNet

	![](./assets/cover.png)

	## Model Introduction

	This model is a structure control model for images, trained based on [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image) .The model architecture is ControlNet, which can control the generated image structure according to the depth (Depth) map .The training framework is built on[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) and the dataset used is [BLIP3o](https://modelscope.cn/datasets/BLIP3o/BLIP3o-60k)。


	## Effect Demonstration

	\|Structure Map\|Generated Image 1\|Generated Image 2\|
	\|-\|-\|-\|
	\|![](./assets/depth2.jpg)\|![](./assets/image2_0.jpg)\|![](./assets/image2_1.jpg)\|
	\|![](./assets/depth3.jpg)\|![](./assets/image3_0.jpg)\|![](./assets/image3_1.jpg)\|
	\|![](./assets/depth1.jpg)\|![](./assets/image1_0.jpg)\|![](./assets/image1_1.jpg)\|

	## Inference Code
	```
	git clone https://github.com/modelscope/DiffSynth-Studio.git
	cd DiffSynth-Studio
	pip install -e .
	```

	```python
	from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig, ControlNetInput
	from PIL import Image
	import torch
	from modelscope import dataset_snapshot_download


	pipe = QwenImagePipeline.from_pretrained(
	torch_dtype=torch.bfloat16,
	device="cuda",
	model_configs=[
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"),
	ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
	ModelConfig(model_id="DiffSynth-Studio/Qwen-Image-Blockwise-ControlNet-Depth", origin_file_pattern="model.safetensors"),
	],
	tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
	)

	dataset_snapshot_download(
	dataset_id="DiffSynth-Studio/example_image_dataset",
	local_dir="./data/example_image_dataset",
	allow_file_pattern="depth/image_1.jpg"
	)

	controlnet_image = Image.open("data/example_image_dataset/depth/image_1.jpg").resize((1328, 1328))

	prompt = "Exquisite portrait of an underwater girl with flowing blue dress and fluttering hair. Transparent light and shadow, surrounded by bubbles. Her face is serene, with exquisite details and dreamy beauty."
	image = pipe(
	prompt, seed=0,
	blockwise_controlnet_inputs=[ControlNetInput(image=controlnet_image)]
	)
	image.save("image.jpg")

	```
	---
	license: apache-2.0
	---