File size: 5,044 Bytes
7a8f188 5ac470c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
license: apache-2.0
datasets:
- huggan/smithsonian_butterflies_subset
tags:
- unconditional-image-generation
- diffusion
- ddpm
- pytorch
- diffusers
- pixel-art
---
# DDPM for 8-bit Pixel Art Wings (ddpm-pixelwing)
This repository contains a Denoising Diffusion Probabilistic Model (DDPM) trained from scratch to generate 8-bit style pixel art images of wings. This model was built using the Hugging Face [`diffusers`](https://github.com/huggingface/diffusers) library.
The model is "unconditional," meaning it generates random wing designs without any specific text or image prompt. It's a fun tool for artists, game developers, or anyone needing inspiration for pixel art sprites.
## Model Description
Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that learn to create data by reversing a gradual noising process. The model learns to denoise an image from pure Gaussian noise, step by step, until a clean, coherent image emerges.
This specific model is based on the architecture proposed in the paper [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and is implemented as a `UNet2DModel` in the `diffusers` library. It was trained on a custom dataset of 8-bit style wing images.
**Model Architecture:**
- **Class:** `UNet2DModel`
- **`sample_size`**: 32
- **`in_channels`**: 3
- **`out_channels`**: 3
- **`layers_per_block`**: 2
- **`block_out_channels`**: (64, 128, 128, 256)
- **`down_block_types`**: (`DownBlock2D`, `DownBlock2D`, `AttnDownBlock2D`, `DownBlock2D`)
- **`up_block_types`**: (`UpBlock2D`, `AttnUpBlock2D`, `UpBlock2D`, `UpBlock2D`)
## Intended Use & Limitations
### Intended Use
This model is primarily intended for creative applications, such as:
- Generating sprites for 2D games.
- Creating assets for digital art and design projects.
- Providing inspiration for pixel artists.
The model can be used as-is for unconditional generation or as a base model for further fine-tuning on a more specific dataset of pixel art.
### Limitations
- **Resolution:** The model generates images at a low resolution of **32x32 pixels**, consistent with its pixel art training data. Upscaling may be required for certain applications, which could introduce artifacts.
- **Lack of Control:** This is an unconditional model, so you cannot direct the output with text prompts (e.g., "a fiery wing"). Generation is random.
- **Artifacts:** Like many generative models, some outputs may contain minor visual artifacts or be less coherent than others. Running the generation process multiple times is encouraged to get a variety of high-quality results.
## How to Use
You can easily use this model for inference with just a few lines of code using the `diffusers` library.
### 1. Installation
First, make sure you have the necessary libraries installed.
```bash
pip install --upgrade diffusers transformers accelerate torch
```
### 2. Inference Pipeline
The following Python script demonstrates how to load the model from the Hugging Face Hub and generate an image.
```python
import torch
from diffusers import DDPMPipeline
from PIL import Image
# For reproducibility
generator = torch.manual_seed(42)
# Load the pretrained pipeline from the Hub
pipeline = DDPMPipeline.from_pretrained("louijiec/ddpm-pixelwing")
# If you have a GPU, move the pipeline to the GPU for faster generation
if torch.cuda.is_available():
pipeline = pipeline.to("cuda")
print("Pipeline loaded. Starting image generation...")
# Run the generation process
# The pipeline returns a dataclass with the generated image
result = pipeline(generator=generator, num_inference_steps=1000)
image = result.images
# The output is a PIL Image, which you can display or save
print("Image generated successfully.")
image.save("pixel_wing.png")
# To generate a batch of images, you can specify `batch_size`
# images = pipeline(batch_size=4, generator=generator).images
# for i, img in enumerate(images):
# img.save(f"pixel_wing_{i+1}.png")
```
This script will generate a 32x32 pixel art wing and save it as `pixel_wing.png` in your current directory.
## Training Details
This model was trained from scratch. The following provides an overview for those interested in the training process or looking to reproduce it.
- **Library:** The model was trained using the official `diffusers` [unconditional image generation training script](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation).
- **Dataset:** The model was trained on a custom dataset named **"PixelWing"**, consisting of approximately 300 unique 32x32 pixel art images of wings. The images were created and curated specifically for this project.
- **Training Procedure:**
- **Image Resolution:** 32x32
- **Epochs:** 200
- **Learning Rate:** 1e-4
- **Batch Size:** 16
- **Gradient Accumulation Steps:** 1
- **Optimizer:** AdamW
- **Hardware:** Trained on a single NVIDIA T4 GPU (commonly available on Google Colab).
|