stable-diffusion-inpainting-segmentation

Runtime error

App Files Files Community

Omnibus

afmck commited on Jan 13, 2023

Commit

7566072

0 Parent(s):

Duplicate from afmck/stable-diffusion-inpainting-segmentation

Browse files

Co-authored-by: Alex McKinney <[email protected]>

Files changed (8) hide show

.gitattributes +34 -0
README.md +18 -0
app.css +114 -0
app.py +239 -0
app_header.html +58 -0
app_license.html +27 -0
example.png +0 -0
requirements.txt +12 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,34 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+---
+title: Stable Diffusion Inpainting Segmentation
+emoji: 😷
+colorFrom: purple
+colorTo: black
+sdk: gradio
+sdk_version: 3.9
+app_file: app.py
+pinned: true
+license: creativeml-openrail-m
+duplicated_from: afmck/stable-diffusion-inpainting-segmentation
+---
+### ToDos:
+- [ ] setting a random seed
+- [ ] click support for segmentation
+- [ ] draw on mask
+- [ ] batching support

app.css ADDED Viewed

	@@ -0,0 +1,114 @@

+.gradio-container {
+    font-family: 'IBM Plex Sans', sans-serif;
+}
+.gr-button {
+    color: white;
+    border-color: black;
+    background: black;
+}
+input[type='range'] {
+    accent-color: black;
+}
+.dark input[type='range'] {
+    accent-color: #dfdfdf;
+}
+.container {
+    max-width: 730px;
+    margin: auto;
+    padding-top: 1.5rem;
+}
+#gallery {
+    min-height: 22rem;
+    margin-bottom: 15px;
+    margin-left: auto;
+    margin-right: auto;
+    border-bottom-right-radius: .5rem !important;
+    border-bottom-left-radius: .5rem !important;
+}
+#gallery>div>.h-full {
+    min-height: 20rem;
+}
+.details:hover {
+    text-decoration: underline;
+}
+.gr-button {
+    white-space: nowrap;
+}
+.gr-button:focus {
+    border-color: rgb(147 197 253 / var(--tw-border-opacity));
+    outline: none;
+    box-shadow: var(--tw-ring-offset-shadow), var(--tw-ring-shadow), var(--tw-shadow, 0 0 #0000);
+    --tw-border-opacity: 1;
+    --tw-ring-offset-shadow: var(--tw-ring-inset) 0 0 0 var(--tw-ring-offset-width) var(--tw-ring-offset-color);
+    --tw-ring-shadow: var(--tw-ring-inset) 0 0 0 calc(3px var(--tw-ring-offset-width)) var(--tw-ring-color);
+    --tw-ring-color: rgb(191 219 254 / var(--tw-ring-opacity));
+    --tw-ring-opacity: .5;
+}
+#advanced-btn {
+    font-size: .7rem !important;
+    line-height: 19px;
+    margin-top: 12px;
+    margin-bottom: 12px;
+    padding: 2px 8px;
+    border-radius: 14px !important;
+}
+#advanced-options {
+    display: none;
+    margin-bottom: 20px;
+}
+.footer {
+    margin-bottom: 45px;
+    margin-top: 35px;
+    text-align: center;
+    border-bottom: 1px solid #e5e5e5;
+}
+.footer>p {
+    font-size: .8rem;
+    display: inline-block;
+    padding: 0 10px;
+    transform: translateY(10px);
+    background: white;
+}
+.dark .footer {
+    border-color: #303030;
+}
+.dark .footer>p {
+    background: #0b0f19;
+}
+.acknowledgments h4{
+    margin: 1.25em 0 .25em 0;
+    font-weight: bold;
+    font-size: 115%;
+}
+#container-advanced-btns{
+    display: flex;
+    flex-wrap: wrap;
+    justify-content: space-between;
+    align-items: center;
+}
+.animate-spin {
+    animation: spin 1s linear infinite;
+}
+@keyframes spin {
+    from {
+        transform: rotate(0deg);
+    }
+    to {
+        transform: rotate(360deg);
+    }
+}
+#share-btn-container {
+    display: flex; padding-left: 0.5rem !important; padding-right: 0.5rem !important; background-color: #000000; justify-content: center; align-items: center; border-radius: 9999px !important; width: 13rem;
+}
+#share-btn {
+    all: initial; color: #ffffff;font-weight: 600; cursor:pointer; font-family: 'IBM Plex Sans', sans-serif; margin-left: 0.5rem !important; padding-top: 0.25rem !important; padding-bottom: 0.25rem !important;
+}
+#share-btn * {
+    all: unset;
+}
+.gr-form{
+    flex: 1 1 50%; border-top-right-radius: 0; border-bottom-right-radius: 0;
+}
+#prompt-container{
+    gap: 0;
+}

app.py ADDED Viewed

	@@ -0,0 +1,239 @@

+import io
+import requests
+import numpy as np
+import torch
+import os
+from PIL import Image
+from typing import List, Optional
+from functools import reduce
+from argparse import ArgumentParser
+import gradio as gr
+from transformers import DetrFeatureExtractor, DetrForSegmentation, DetrConfig
+from transformers.models.detr.feature_extraction_detr import rgb_to_id
+from diffusers import StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler
+parser = ArgumentParser()
+parser.add_argument('--disable-cuda', action='store_true')
+parser.add_argument('--attention-slicing', action='store_true')
+args = parser.parse_args()
+auth_token = os.environ.get("READ_TOKEN")
+try_cuda = not args.disable_cuda
+torch.inference_mode()
+torch.no_grad()
+# Device helper
+def get_device(try_cuda=True):
+    return torch.device('cuda' if try_cuda and torch.cuda.is_available() else 'cpu')
+device = get_device(try_cuda=try_cuda)
+# Load segmentation models
+def load_segmentation_models(model_name: str = 'facebook/detr-resnet-50-panoptic'):
+    feature_extractor = DetrFeatureExtractor.from_pretrained(model_name)
+    model = DetrForSegmentation.from_pretrained(model_name)
+    cfg = DetrConfig.from_pretrained(model_name)
+    return feature_extractor, model, cfg
+# Load diffusion pipeline
+def load_diffusion_pipeline(model_name: str = 'stabilityai/stable-diffusion-2-inpainting'):
+    return StableDiffusionInpaintPipeline.from_pretrained(
+        model_name,
+        revision='fp16',
+        torch_dtype=torch.float16 if try_cuda and torch.cuda.is_available() else torch.float32,
+        use_auth_token=auth_token
+    )
+def min_pool(x: torch.Tensor, kernel_size: int):
+    pad_size = (kernel_size - 1) // 2
+    return -torch.nn.functional.max_pool2d(-x, kernel_size, (1, 1), padding=pad_size)
+def max_pool(x: torch.Tensor, kernel_size: int):
+    pad_size = (kernel_size - 1) // 2
+    return torch.nn.functional.max_pool2d(x, kernel_size, (1, 1), padding=pad_size)
+# Apply min-max pooling to clean up mask
+def clean_mask(mask, max_kernel: int = 23, min_kernel: int = 5):
+    mask = torch.Tensor(mask[None, None]).float().to(device)
+    mask = min_pool(mask, min_kernel)
+    mask = max_pool(mask, max_kernel)
+    mask = mask.bool().squeeze().cpu().numpy()
+    return mask
+feature_extractor, segmentation_model, segmentation_cfg = load_segmentation_models()
+pipe = load_diffusion_pipeline()
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+segmentation_model = segmentation_model.to(device)
+pipe = pipe.to(device)
+if args.attention_slicing:
+    pipe.enable_attention_slicing()
+# Callback function that runs segmentation and updates CheckboxGroup
+def fn_segmentation(image, max_kernel, min_kernel):
+    inputs = feature_extractor(images=image, return_tensors="pt").to(device)
+    outputs = segmentation_model(**inputs)
+    processed_sizes = torch.as_tensor(inputs["pixel_values"].shape[-2:]).unsqueeze(0)
+    result = feature_extractor.post_process_panoptic(outputs, processed_sizes)[0]
+    panoptic_seg = Image.open(io.BytesIO(result["png_string"])).resize((image.width, image.height))
+    panoptic_seg = np.array(panoptic_seg, dtype=np.uint8)
+    panoptic_seg_id = rgb_to_id(panoptic_seg)
+    raw_masks = []
+    for s in result['segments_info']:
+        m = panoptic_seg_id == s['id']
+        raw_masks.append(m.astype(np.uint8) * 255)
+    checkbox_choices = [f"{s['id']}:{segmentation_cfg.id2label[s['category_id']]}" for s in result['segments_info']]
+    checkbox_group = gr.CheckboxGroup.update(
+        choices=checkbox_choices
+    )
+    return raw_masks, checkbox_group, gr.Image.update(value=np.zeros((image.height, image.width))), gr.Image.update(value=image)
+# Callback function that updates the displayed mask based on selected checkboxes
+def fn_update_mask(
+        image: Image,
+        masks: List[np.array],
+        masks_enabled: List[int],
+        max_kernel: int,
+        min_kernel: int,
+        invert_mask: bool
+    ):
+    masks_enabled = [int(m.split(':')[0]) for m in masks_enabled]
+    combined_mask = reduce(lambda x, y: x | y, [masks[i] for i in masks_enabled], np.zeros_like(masks[0], dtype=bool))
+    if invert_mask:
+        combined_mask = ~combined_mask
+    combined_mask = clean_mask(combined_mask, max_kernel, min_kernel)
+    masked_image = np.array(image).copy()
+    masked_image[combined_mask] = 0.0
+    return combined_mask.astype(np.uint8) * 255, Image.fromarray(masked_image)
+# Callback function that runs diffusion given the current image, mask and prompt.
+def fn_diffusion(
+        prompt: str,
+        masked_image: Image,
+        mask: Image,
+        num_diffusion_steps: int,
+        guidance_scale: float,
+        negative_prompt: Optional[str] = None,
+    ):
+    if len(negative_prompt) == 0:
+        negative_prompt = None
+    # Resize image to a more stable diffusion friendly format.
+    # TODO: remove magic number
+    STABLE_DIFFUSION_SMALL_EDGE = 512
+    w, h = masked_image.size
+    is_width_larger = w > h
+    resize_ratio = STABLE_DIFFUSION_SMALL_EDGE / (h if is_width_larger else w)
+    new_width = int(w * resize_ratio) if is_width_larger else STABLE_DIFFUSION_SMALL_EDGE
+    new_height = STABLE_DIFFUSION_SMALL_EDGE if is_width_larger else int(h * resize_ratio)
+    new_width += 8 - (new_width % 8) if is_width_larger else 0
+    new_height += 0 if is_width_larger else 8 - (new_height % 8)
+    mask = Image.fromarray(mask).convert("RGB").resize((new_width, new_height))
+    masked_image = masked_image.convert("RGB").resize((new_width, new_height))
+    # Run diffusion
+    inpainted_image = pipe(
+        height=new_height,
+        width=new_width,
+        prompt=prompt,
+        image=masked_image,
+        mask_image=mask,
+        num_inference_steps=num_diffusion_steps,
+        guidance_scale=guidance_scale,
+        negative_prompt=negative_prompt
+    ).images[0]
+    # Resize back to the original size
+    inpainted_image = inpainted_image.resize((w, h))
+    return inpainted_image
+demo = gr.Blocks(css=open('app.css').read())
+with demo:
+    gr.HTML(open('app_header.html').read())
+    if not try_cuda or not torch.cuda.is_available():
+        gr.HTML('<div class="alert alert-warning" role="alert" style="color:red"><b>Warning: GPU not available! Diffusion will be slow.</b></div>')
+    # Input image control
+    input_image = gr.Image(value="example.png", type='pil', label="Input Image")
+    # Combined mask controls
+    bt_masks = gr.Button("Compute Masks")
+    with gr.Row():
+        mask_image = gr.Image(type='numpy', label="Diffusion Mask")
+        masked_image = gr.Image(type='pil', label="Masked Image")
+    mask_storage = gr.State()
+    # Mask editing controls
+    with gr.Row():
+        max_slider = gr.Slider(minimum=1, maximum=99, value=23, step=2, label="Mask Overflow")
+        min_slider = gr.Slider(minimum=1, maximum=99, value=5, step=2, label="Mask Denoising")
+    with gr.Row():
+        invert_mask = gr.Checkbox(label="Invert Mask")
+        mask_checkboxes = gr.CheckboxGroup(interactive=True, label="Mask Selection")
+    # Diffusion controls and output
+    with gr.Row():
+        with gr.Column():
+            prompt = gr.Textbox("An angry dog floating in outer deep space. Twinkling stars in the background. High definition.", label="Prompt")
+            negative_prompt = gr.Textbox(label="Negative Prompt")
+        with gr.Column():
+            steps_slider = gr.Slider(minimum=1, maximum=100, value=50, label="Inference Steps")
+            guidance_slider = gr.Slider(minimum=0.0, maximum=50.0, value=7.5, step=0.1, label="Guidance Scale")
+            bt_diffusion = gr.Button("Run Diffusion")
+        inpainted_image = gr.Image(type='pil', label="Inpainted Image")
+    # TODO: saw a better way of handling many inputs online..
+    # forgot where though
+    update_mask_inputs = [input_image, mask_storage, mask_checkboxes, max_slider, min_slider, invert_mask]
+    update_mask_outputs = [mask_image, masked_image]
+    # Clear checkbox group on input image change
+    input_image.change(lambda: gr.CheckboxGroup.update(choices=[], value=[]), outputs=mask_checkboxes)
+    input_image.change(lambda: gr.Checkbox.update(value=False), outputs=invert_mask)
+    # Segmentation button callback
+    bt_masks.click(fn_segmentation, inputs=[input_image, max_slider, min_slider], outputs=[mask_storage, mask_checkboxes, mask_image, masked_image])
+    # Update mask callbacks
+    max_slider.change(fn_update_mask, inputs=update_mask_inputs, outputs=update_mask_outputs, show_progress=False)
+    min_slider.change(fn_update_mask, inputs=update_mask_inputs, outputs=update_mask_outputs, show_progress=False)
+    mask_checkboxes.change(fn_update_mask, inputs=update_mask_inputs, outputs=update_mask_outputs, show_progress=False)
+    invert_mask.change(fn_update_mask, inputs=update_mask_inputs, outputs=update_mask_outputs, show_progress=False)
+    # Diffusion button callback
+    bt_diffusion.click(fn_diffusion, inputs=[
+        prompt,
+        masked_image,
+        mask_image,
+        steps_slider,
+        guidance_slider,
+        negative_prompt
+    ], outputs=inpainted_image)
+    gr.HTML(open('app_license.html').read())
+demo.launch()

app_header.html ADDED Viewed

	@@ -0,0 +1,58 @@

+<div style="text-align: center; max-width: 650px; margin: 0 auto;">
+    <div
+      style="
+        display: inline-flex;
+        align-items: center;
+        gap: 0.8rem;
+        font-size: 1.75rem;
+      "
+    >
+      <svg
+        width="0.65em"
+        height="0.65em"
+        viewBox="0 0 115 115"
+        fill="none"
+        xmlns="http://www.w3.org/2000/svg"
+      >
+        <rect width="23" height="23" fill="white"></rect>
+        <rect y="69" width="23" height="23" fill="white"></rect>
+        <rect x="23" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="23" y="69" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="46" width="23" height="23" fill="white"></rect>
+        <rect x="46" y="69" width="23" height="23" fill="white"></rect>
+        <rect x="69" width="23" height="23" fill="black"></rect>
+        <rect x="69" y="69" width="23" height="23" fill="black"></rect>
+        <rect x="92" width="23" height="23" fill="#D9D9D9"></rect>
+        <rect x="92" y="69" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="115" y="46" width="23" height="23" fill="white"></rect>
+        <rect x="115" y="115" width="23" height="23" fill="white"></rect>
+        <rect x="115" y="69" width="23" height="23" fill="#D9D9D9"></rect>
+        <rect x="92" y="46" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="92" y="115" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="92" y="69" width="23" height="23" fill="white"></rect>
+        <rect x="69" y="46" width="23" height="23" fill="white"></rect>
+        <rect x="69" y="115" width="23" height="23" fill="white"></rect>
+        <rect x="69" y="69" width="23" height="23" fill="#D9D9D9"></rect>
+        <rect x="46" y="46" width="23" height="23" fill="black"></rect>
+        <rect x="46" y="115" width="23" height="23" fill="black"></rect>
+        <rect x="46" y="69" width="23" height="23" fill="black"></rect>
+        <rect x="23" y="46" width="23" height="23" fill="#D9D9D9"></rect>
+        <rect x="23" y="115" width="23" height="23" fill="#AEAEAE"></rect>
+        <rect x="23" y="69" width="23" height="23" fill="black"></rect>
+      </svg>
+      <h1 style="font-weight: 900; margin-bottom: 7px; margin-top: 7px">
+        Stable Diffusion x Segmentation Masking 😷
+      </h1>
+    </div>
+    <p style="margin-bottom: 10px; font-size: 94%">
+      Stable Diffusion is a state of the art text-to-image model that generates
+      images from text. Finetuning the model can make it suitable for inpainting
+      when provided with a starting image, mask, and text prompt.
+    </p>
+    <p style="margin-bottom: 10px; font-size: 94%">
+      However, depending on how complex the area you want to mask is, creating
+      the mask can be tedious. This demo incorporates a segmentation model to
+      generate per-class masks for you, which can be combined to produce a final
+      diffusion mask.
+    </p>
+  </div>

app_license.html ADDED Viewed

	@@ -0,0 +1,27 @@

+<div class="acknowledgments">
+    <p><h4>LICENSE</h4>
+The model is licensed with a <a
+href="https://huggingface.co/spaces/CompVis/stable-diffusion-license"
+style="text-decoration: underline;" target="_blank">CreativeML Open RAIL-M</a>
+license. The authors claim no rights on the outputs you generate, you are free
+to use them and are accountable for their use which must not go against the
+provisions set in this license. The license forbids you from sharing any content
+that violates any laws, produce any harm to a person, disseminate any personal
+information that would be meant for harm, spread misinformation and target
+vulnerable groups. For the full list of restrictions please <a
+href="https://huggingface.co/spaces/CompVis/stable-diffusion-license"
+target="_blank" style="text-decoration: underline;" target="_blank">read the
+license</a></p>
+    <p><h4>Biases and content acknowledgment</h4>
+Despite how impressive being able to turn text into image is, beware to the fact
+that this model may output content that reinforces or exacerbates societal
+biases, as well as realistic faces, pornography and violence. The model was
+trained on the <a href="https://laion.ai/blog/laion-5b/" style="text-decoration:
+underline;" target="_blank">LAION-5B dataset</a>, which scraped non-curated
+image-text-pairs from the internet (the exception being the removal of illegal
+content) and is meant for research purposes. You can read more in the <a
+href="https://huggingface.co/CompVis/stable-diffusion-v1-4"
+style="text-decoration: underline;" target="_blank">model card</a>. Additionally,
+you can read more about the inpainting finetuning process in this
+<a href="https://huggingface.co/runwayml/stable-diffusion-inpainting" style="text-decoration: underline">model card</a>.</p>
+</div>

example.png ADDED Viewed

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+--extra-index-url https://download.pytorch.org/whl/cu113
+torch
+build==0.6.0
+diffusers==0.9.0
+ftfy==6.1.1
+gradio==3.9.1
+timm==0.6.11
+transformers==4.22.1
+accelerate
+https://github.com/apolinario/xformers/releases/download/0.0.3/xformers-0.0.14.dev0-cp38-cp38-linux_x86_64.whl