seawolf2357's picture
Upload folder using huggingface_hub
321d89c verified
|
raw
history blame
17.8 kB
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# ํ›‘์–ด๋ณด๊ธฐ
Diffusion ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๋‚˜ ์˜ค๋””์˜ค์™€ ๊ฐ™์€ ๊ด€์‹ฌ ์ƒ˜ํ”Œ๋“ค์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋ฅผ ๋‹จ๊ณ„๋ณ„๋กœ ์ œ๊ฑฐํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ƒ์„ฑ AI์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๋งค์šฐ ๋†’์•„์กŒ์œผ๋ฉฐ, ์ธํ„ฐ๋„ท์—์„œ diffusion ์ƒ์„ฑ ์ด๋ฏธ์ง€์˜ ์˜ˆ๋ฅผ ๋ณธ ์ ์ด ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๐Ÿงจ Diffusers๋Š” ๋ˆ„๊ตฌ๋‚˜ diffusion ๋ชจ๋ธ๋“ค์„ ๋„๋ฆฌ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.
๊ฐœ๋ฐœ์ž๋“  ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๋“  ์ด ํ›‘์–ด๋ณด๊ธฐ๋ฅผ ํ†ตํ•ด ๐Ÿงจ Diffusers๋ฅผ ์†Œ๊ฐœํ•˜๊ณ  ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค! ์•Œ์•„์•ผ ํ•  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋Š” ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค:
* [`DiffusionPipeline`]์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ๋ชจ๋ธ์—์„œ ์ƒ˜ํ”Œ์„ ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•˜๋„๋ก ์„ค๊ณ„๋œ ๋†’์€ ์ˆ˜์ค€์˜ ์—”๋“œํˆฌ์—”๋“œ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
* Diffusion ์‹œ์Šคํ…œ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋นŒ๋”ฉ ๋ธ”๋ก์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์‚ฌ์ „ ํ•™์Šต๋œ [model](./api/models) ์•„ํ‚คํ…์ฒ˜ ๋ฐ ๋ชจ๋“ˆ.
* ๋‹ค์–‘ํ•œ [schedulers](./api/schedulers/overview) - ํ•™์Šต์„ ์œ„ํ•ด ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ถ”๋ก  ์ค‘์— ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์–ดํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค.
ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ์ถ”๋ก ์„ ์œ„ํ•ด [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค€ ๋‹ค์Œ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ [`DiffusionPipeline`] ๋‚ด๋ถ€์—์„œ ์ผ์–ด๋‚˜๋Š” ์ผ์„ ๋ณต์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.
<Tip>
ํ›‘์–ด๋ณด๊ธฐ๋Š” ๊ฐ„๊ฒฐํ•œ ๋ฒ„์ „์˜ ๐Ÿงจ Diffusers ์†Œ๊ฐœ๋กœ์„œ [๋…ธํŠธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) ๋น ๋ฅด๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๋””ํ“จ์ €์˜ ๋ชฉํ‘œ, ๋””์ž์ธ ์ฒ ํ•™, ํ•ต์‹ฌ API์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ž์„ธํžˆ ์•Œ์•„๋ณด๋ ค๋ฉด ๋…ธํŠธ๋ถ์„ ํ™•์ธํ•˜์„ธ์š”!
</Tip>
์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```py
# ์ฃผ์„ ํ’€์–ด์„œ Colab์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ํ•˜๊ธฐ.
#!pip install --upgrade diffusers accelerate transformers
```
- [๐Ÿค— Accelerate](https://huggingface.co/docs/accelerate/index)๋Š” ์ถ”๋ก  ๋ฐ ํ•™์Šต์„ ์œ„ํ•œ ๋ชจ๋ธ ๋กœ๋”ฉ ์†๋„๋ฅผ ๋†’์—ฌ์ค๋‹ˆ๋‹ค.
- [๐Ÿค— Transformers](https://huggingface.co/docs/transformers/index)๋Š” [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview)๊ณผ ๊ฐ™์ด ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” diffusion ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
## DiffusionPipeline
[`DiffusionPipeline`] ์€ ์ถ”๋ก ์„ ์œ„ํ•ด ์‚ฌ์ „ ํ•™์Šต๋œ diffusion ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์—”๋“œ ํˆฌ ์—”๋“œ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ์ž‘์—…์— [`DiffusionPipeline`]์„ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ์—์„œ ์ง€์›๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์ž‘์—…์„ ์‚ดํŽด๋ณด๊ณ , ์ง€์›๋˜๋Š” ์ž‘์—…์˜ ์ „์ฒด ๋ชฉ๋ก์€ [๐Ÿงจ Diffusers Summary](./api/pipelines/overview#diffusers-summary) ํ‘œ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
| **Task** | **Description** | **Pipeline**
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
| Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
๋จผ์ € [`DiffusionPipeline`]์˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์šด๋กœ๋“œํ•  ํŒŒ์ดํ”„๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ์— ์ €์žฅ๋œ ๋ชจ๋“  [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads)์— ๋Œ€ํ•ด [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” text-to-image ์ƒ์„ฑ์„ ์œ„ํ•œ [`stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
<Tip warning={true}>
[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— [๋ผ์ด์„ ์Šค](https://huggingface.co/spaces/CompVis/stable-diffusion-license)๋ฅผ ๋จผ์ € ์ฃผ์˜ ๊นŠ๊ฒŒ ์ฝ์–ด์ฃผ์„ธ์š”. ๐Ÿงจ Diffusers๋Š” ๋ถˆ์พŒํ•˜๊ฑฐ๋‚˜ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py)๋ฅผ ๊ตฌํ˜„ํ•˜๊ณ  ์žˆ์ง€๋งŒ, ๋ชจ๋ธ์˜ ํ–ฅ์ƒ๋œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๊ธฐ๋Šฅ์œผ๋กœ ์ธํ•ด ์—ฌ์ „ํžˆ ์ž ์žฌ์ ์œผ๋กœ ์œ ํ•ดํ•œ ์ฝ˜ํ…์ธ ๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
</Tip>
[`~DiffusionPipeline.from_pretrained`] ๋ฐฉ๋ฒ•์œผ๋กœ ๋ชจ๋ธ ๋กœ๋“œํ•˜๊ธฐ:
```python
>>> from diffusers import DiffusionPipeline
>>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
```
The [`DiffusionPipeline`]์€ ๋ชจ๋“  ๋ชจ๋ธ๋ง, ํ† ํฐํ™”, ์Šค์ผ€์ค„๋ง ์ปดํฌ๋„ŒํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์บ์‹œํ•ฉ๋‹ˆ๋‹ค. Stable Diffusion Pipeline์€ ๋ฌด์—‡๋ณด๋‹ค๋„ [`UNet2DConditionModel`]๊ณผ [`PNDMScheduler`]๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
>>> pipeline
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.13.1",
...,
"scheduler": [
"diffusers",
"PNDMScheduler"
],
...,
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
์ด ๋ชจ๋ธ์€ ์•ฝ 14์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ GPU์—์„œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
PyTorch์—์„œ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๊ฐ์ฒด๋ฅผ GPU๋กœ ์ด๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
>>> pipeline.to("cuda")
```
์ด์ œ `ํŒŒ์ดํ”„๋ผ์ธ`์— ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ „๋‹ฌํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•œ ๋‹ค์Œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋ฏธ์ง€ ์ถœ๋ ฅ์€ [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) ๊ฐ์ฒด๋กœ ๊ฐ์‹ธ์ง‘๋‹ˆ๋‹ค.
```python
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
</div>
`save`๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค:
```python
>>> image.save("image_of_squirrel_painting.png")
```
### ๋กœ์ปฌ ํŒŒ์ดํ”„๋ผ์ธ
ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ์ปฌ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์œ ์ผํ•œ ์ฐจ์ด์ ์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋จผ์ € ๋‹ค์šด๋กœ๋“œํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค:
```bash
!git lfs install
!git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
```
๊ทธ๋Ÿฐ ๋‹ค์Œ ์ €์žฅ๋œ ๊ฐ€์ค‘์น˜๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์— ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:
```python
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
```
์ด์ œ ์œ„ ์„น์…˜์—์„œ์™€ ๊ฐ™์ด ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ์Šค์ผ€์ค„๋Ÿฌ ๊ต์ฒด
์Šค์ผ€์ค„๋Ÿฌ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ์†๋„์™€ ํ’ˆ์งˆ์ด ์„œ๋กœ ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ž์‹ ์—๊ฒŒ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์ฐพ๋Š” ๊ฐ€์žฅ ์ข‹์€ ๋ฐฉ๋ฒ•์€ ์ง์ ‘ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๐Ÿงจ Diffusers์˜ ์ฃผ์š” ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ ๊ฐ„์— ์‰ฝ๊ฒŒ ์ „ํ™˜์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ธฐ๋ณธ ์Šค์ผ€์ค„๋Ÿฌ์ธ [`PNDMScheduler`]๋ฅผ [`EulerDiscreteScheduler`]๋กœ ๋ฐ”๊พธ๋ ค๋ฉด, [`~diffusers.ConfigMixin.from_config`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋“œํ•˜์„ธ์š”:
```py
>>> from diffusers import EulerDiscreteScheduler
>>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5")
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
```
์ƒˆ ์Šค์ผ€์ค„๋Ÿฌ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด๋ณด๊ณ  ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด ๋ณด์„ธ์š”!
๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ๋ผ๋Š” [`DiffusionPipeline`]์„ ๊ตฌ์„ฑํ•˜๋Š” ์ปดํฌ๋„ŒํŠธ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ณ  ์ด๋Ÿฌํ•œ ์ปดํฌ๋„ŒํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์–‘์ด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์›Œ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
## ๋ชจ๋ธ
๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ƒ˜ํ”Œ์„ ๊ฐ€์ ธ์™€ ๊ฐ ์‹œ๊ฐ„ ๊ฐ„๊ฒฉ๋งˆ๋‹ค ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ด๋ฏธ์ง€์™€ ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์‚ฌ์ด์˜ ์ฐจ์ด์ธ *๋…ธ์ด์ฆˆ ์ž”์ฐจ*(๋‹ค๋ฅธ ๋ชจ๋ธ์€ ์ด์ „ ์ƒ˜ํ”Œ์„ ์ง์ ‘ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜ ์†๋„ ๋˜๋Š” [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)์„ ์˜ˆ์ธกํ•˜๋Š” ํ•™์Šต์„ ํ•ฉ๋‹ˆ๋‹ค)์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ๋ฏน์Šค ์•ค ๋งค์น˜ํ•˜์—ฌ ๋‹ค๋ฅธ diffusion ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ์€ [`~ModelMixin.from_pretrained`] ๋ฉ”์„œ๋“œ๋กœ ์‹œ์ž‘๋˜๋ฉฐ, ์ด ๋ฉ”์„œ๋“œ๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ์ปฌ์— ์บ์‹œํ•˜์—ฌ ๋‹ค์Œ์— ๋ชจ๋ธ์„ ๋กœ๋“œํ•  ๋•Œ ๋” ๋น ๋ฅด๊ฒŒ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›‘์–ด๋ณด๊ธฐ์—์„œ๋Š” ๊ณ ์–‘์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•™์Šต๋œ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์žˆ๋Š” ๊ธฐ๋ณธ์ ์ธ unconditional ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์ธ [`UNet2DModel`]์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from diffusers import UNet2DModel
>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id)
```
๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์— ์•ก์„ธ์Šคํ•˜๋ ค๋ฉด `model.config`๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค:
```py
>>> model.config
```
๋ชจ๋ธ ๊ตฌ์„ฑ์€ ๐ŸงŠ ๊ณ ์ •๋œ ๐ŸงŠ ๋”•์…”๋„ˆ๋ฆฌ๋กœ, ๋ชจ๋ธ์ด ์ƒ์„ฑ๋œ ํ›„์—๋Š” ํ•ด๋‹น ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์˜๋„์ ์ธ ๊ฒƒ์œผ๋กœ, ์ฒ˜์Œ์— ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ๋‹ค๋ฅธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ถ”๋ก  ์ค‘์— ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
* `sample_size`: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ๋†’์ด ๋ฐ ๋„ˆ๋น„ ์น˜์ˆ˜์ž…๋‹ˆ๋‹ค.
* `in_channels`: ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์ž…๋‹ˆ๋‹ค.
* `down_block_types` ๋ฐ `up_block_types`: UNet ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์šด ๋ฐ ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์œ ํ˜•.
* `block_out_channels`: ๋‹ค์šด์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ถœ๋ ฅ ์ฑ„๋„ ์ˆ˜. ์—…์ƒ˜ํ”Œ๋ง ๋ธ”๋ก์˜ ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ์—ญ์ˆœ์œผ๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.
* `layers_per_block`: ๊ฐ UNet ๋ธ”๋ก์— ์กด์žฌํ•˜๋Š” ResNet ๋ธ”๋ก์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค.
์ถ”๋ก ์— ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ๋žœ๋ค ๊ฐ€์šฐ์‹œ์•ˆ ๋…ธ์ด์ฆˆ๋กœ ์ด๋ฏธ์ง€ ๋ชจ์–‘์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฌด์ž‘์œ„ ๋…ธ์ด์ฆˆ๋ฅผ ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ 'batch' ์ถ•, ์ž…๋ ฅ ์ฑ„๋„ ์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” 'channel' ์ถ•, ์ด๋ฏธ์ง€์˜ ๋†’์ด์™€ ๋„ˆ๋น„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” 'sample_size' ์ถ•์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import torch
>>> torch.manual_seed(0)
>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])
```
์ถ”๋ก ์„ ์œ„ํ•ด ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ `timestep`์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. 'timestep'์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋…ธ์ด์ฆˆ ์ •๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์‹œ์ž‘ ๋ถ€๋ถ„์— ๋” ๋งŽ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๊ณ  ๋ ๋ถ€๋ถ„์— ๋” ์ ์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด diffusion ๊ณผ์ •์—์„œ ์‹œ์ž‘ ๋˜๋Š” ๋์— ๋” ๊ฐ€๊นŒ์šด ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. `sample` ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ์ถœ๋ ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค:
```py
>>> with torch.no_grad():
... noisy_residual = model(sample=noisy_sample, timestep=2).sample
```
ํ•˜์ง€๋งŒ ์‹ค์ œ ์˜ˆ๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค๋ฅผ ์•ˆ๋‚ดํ•  ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ์„น์…˜์—์„œ๋Š” ๋ชจ๋ธ์„ ์Šค์ผ€์ค„๋Ÿฌ์™€ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ด…๋‹ˆ๋‹ค.
## ์Šค์ผ€์ค„๋Ÿฌ
์Šค์ผ€์ค„๋Ÿฌ๋Š” ๋ชจ๋ธ ์ถœ๋ ฅ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์€ ์ƒ˜ํ”Œ์—์„œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค - ์ด ๊ฒฝ์šฐ 'noisy_residual'.
<Tip>
๐Ÿงจ Diffusers๋Š” Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ํˆด๋ฐ•์Šค์ž…๋‹ˆ๋‹ค. [`DiffusionPipeline`]์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด์ง„ Diffusion ์‹œ์Šคํ…œ์„ ํŽธ๋ฆฌํ•˜๊ฒŒ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ชจ๋ธ๊ณผ ์Šค์ผ€์ค„๋Ÿฌ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉ์ž ์ง€์ • Diffusion ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
</Tip>
ํ›‘์–ด๋ณด๊ธฐ์˜ ๊ฒฝ์šฐ, [`~diffusers.ConfigMixin.from_config`] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ [`DDPMScheduler`]๋ฅผ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค:
```py
>>> from diffusers import DDPMScheduler
>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
"_class_name": "DDPMScheduler",
"_diffusers_version": "0.13.1",
"beta_end": 0.02,
"beta_schedule": "linear",
"beta_start": 0.0001,
"clip_sample": true,
"clip_sample_range": 1.0,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"trained_betas": null,
"variance_type": "fixed_small"
}
```
<Tip>
๐Ÿ’ก ์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ ๊ตฌ์„ฑ์—์„œ ์–ด๋–ป๊ฒŒ ์ธ์Šคํ„ด์Šคํ™”๋˜๋Š”์ง€ ์ฃผ๋ชฉํ•˜์„ธ์š”. ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ ์Šค์ผ€์ค„๋Ÿฌ์—๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—†์œผ๋ฉฐ ๋งค๊ฐœ๋ณ€์ˆ˜๋„ ์—†์Šต๋‹ˆ๋‹ค!
</Tip>
๊ฐ€์žฅ ์ค‘์š”ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
* `num_train_timesteps`: ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ๊ธธ์ด, ์ฆ‰ ๋žœ๋ค ๊ฐ€์šฐ์Šค ๋…ธ์ด์ฆˆ๋ฅผ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํƒ€์ž„์Šคํ… ์ˆ˜์ž…๋‹ˆ๋‹ค.
* `beta_schedule`: ์ถ”๋ก  ๋ฐ ํ•™์Šต์— ์‚ฌ์šฉํ•  ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„ ์œ ํ˜•์ž…๋‹ˆ๋‹ค.
* `beta_start` ๋ฐ `beta_end`: ๋…ธ์ด์ฆˆ ์Šค์ผ€์ค„์˜ ์‹œ์ž‘ ๋ฐ ์ข…๋ฃŒ ๋…ธ์ด์ฆˆ ๊ฐ’์ž…๋‹ˆ๋‹ค.
๋…ธ์ด์ฆˆ๊ฐ€ ์•ฝ๊ฐ„ ์ ์€ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋ ค๋ฉด ์Šค์ผ€์ค„๋Ÿฌ์˜ [`~diffusers.DDPMScheduler.step`] ๋ฉ”์„œ๋“œ์— ๋ชจ๋ธ ์ถœ๋ ฅ, `timestep`, ํ˜„์žฌ `sample`์„ ์ „๋‹ฌํ•˜์„ธ์š”.
```py
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape
```
`less_noisy_sample`์„ ๋‹ค์Œ `timestep`์œผ๋กœ ๋„˜๊ธฐ๋ฉด ๋…ธ์ด์ฆˆ๊ฐ€ ๋” ์ค„์–ด๋“ญ๋‹ˆ๋‹ค! ์ด์ œ ์ด ๋ชจ๋“  ๊ฒƒ์„ ํ•œ๋ฐ ๋ชจ์•„ ์ „์ฒด ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๊ณผ์ •์„ ์‹œ๊ฐํ™”ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
๋จผ์ € ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ๋œ ์ด๋ฏธ์ง€๋ฅผ ํ›„์ฒ˜๋ฆฌํ•˜์—ฌ `PIL.Image`๋กœ ํ‘œ์‹œํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
```py
>>> import PIL.Image
>>> import numpy as np
>>> def display_sample(sample, i):
... image_processed = sample.cpu().permute(0, 2, 3, 1)
... image_processed = (image_processed + 1.0) * 127.5
... image_processed = image_processed.numpy().astype(np.uint8)
... image_pil = PIL.Image.fromarray(image_processed[0])
... display(f"Image at step {i}")
... display(image_pil)
```
๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ํ”„๋กœ์„ธ์Šค์˜ ์†๋„๋ฅผ ๋†’์ด๋ ค๋ฉด ์ž…๋ ฅ๊ณผ ๋ชจ๋ธ์„ GPU๋กœ ์˜ฎ๊ธฐ์„ธ์š”:
```py
>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")
```
์ด์ œ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์˜ ์ž”์ฐจ๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์Šค์ผ€์ค„๋Ÿฌ๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์ ์€ ์ƒ˜ํ”Œ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:
```py
>>> import tqdm
>>> sample = noisy_sample
>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
... # 1. predict noise residual
... with torch.no_grad():
... residual = model(sample, t).sample
... # 2. compute less noisy image and set x_t -> x_t-1
... sample = scheduler.step(residual, t, sample).prev_sample
... # 3. optionally look at image
... if (i + 1) % 50 == 0:
... display_sample(sample, i + 1)
```
๊ฐ€๋งŒํžˆ ์•‰์•„์„œ ๊ณ ์–‘์ด๊ฐ€ ์†Œ์Œ์œผ๋กœ๋งŒ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ์ง€์ผœ๋ณด์„ธ์š”!๐Ÿ˜ป
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
</div>
## ๋‹ค์Œ ๋‹จ๊ณ„
์ด๋ฒˆ ํ›‘์–ด๋ณด๊ธฐ์—์„œ ๐Ÿงจ Diffusers๋กœ ๋ฉ‹์ง„ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ณด์…จ๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค! ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๋„˜์–ด๊ฐ€์„ธ์š”:
* [training](./tutorials/basic_training) ํŠœํ† ๋ฆฌ์–ผ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ํŒŒ์ธํŠœ๋‹ํ•˜์—ฌ ๋‚˜๋งŒ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* ๋‹ค์–‘ํ•œ ์‚ฌ์šฉ ์‚ฌ๋ก€๋Š” ๊ณต์‹ ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ [ํ•™์Šต ๋˜๋Š” ํŒŒ์ธํŠœ๋‹ ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) ์˜ˆ์‹œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
* ์Šค์ผ€์ค„๋Ÿฌ ๋กœ๋“œ, ์•ก์„ธ์Šค, ๋ณ€๊ฒฝ ๋ฐ ๋น„๊ต์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ [๋‹ค๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ ์‚ฌ์šฉ](./using-diffusers/schedulers) ๊ฐ€์ด๋“œ์—์„œ ํ™•์ธํ•˜์„ธ์š”.
* [Stable Diffusion](./stable_diffusion) ๊ฐ€์ด๋“œ์—์„œ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง, ์†๋„ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”, ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์„ ์œ„ํ•œ ํŒ๊ณผ ์š”๋ น์„ ์‚ดํŽด๋ณด์„ธ์š”.
* [GPU์—์„œ ํŒŒ์ดํ† ์น˜ ์ตœ์ ํ™”](./optimization/fp16) ๊ฐ€์ด๋“œ์™€ [์• ํ”Œ ์‹ค๋ฆฌ์ฝ˜(M1/M2)์—์„œ์˜ Stable Diffusion](./optimization/mps) ๋ฐ [ONNX ๋Ÿฐํƒ€์ž„](./optimization/onnx) ์‹คํ–‰์— ๋Œ€ํ•œ ์ถ”๋ก  ๊ฐ€์ด๋“œ๋ฅผ ํ†ตํ•ด ๐Ÿงจ Diffuser ์†๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ๋” ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”.