|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- huggan/smithsonian_butterflies_subset |
|
tags: |
|
- unconditional-image-generation |
|
- diffusion |
|
- ddpm |
|
- pytorch |
|
- diffusers |
|
- pixel-art |
|
--- |
|
|
|
# DDPM for 8-bit Pixel Art Wings (ddpm-pixelwing) |
|
|
|
This repository contains a Denoising Diffusion Probabilistic Model (DDPM) trained from scratch to generate 8-bit style pixel art images of wings. This model was built using the Hugging Face [`diffusers`](https://github.com/huggingface/diffusers) library. |
|
|
|
The model is "unconditional," meaning it generates random wing designs without any specific text or image prompt. It's a fun tool for artists, game developers, or anyone needing inspiration for pixel art sprites. |
|
|
|
## Model Description |
|
|
|
Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that learn to create data by reversing a gradual noising process. The model learns to denoise an image from pure Gaussian noise, step by step, until a clean, coherent image emerges. |
|
|
|
This specific model is based on the architecture proposed in the paper [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and is implemented as a `UNet2DModel` in the `diffusers` library. It was trained on a custom dataset of 8-bit style wing images. |
|
|
|
**Model Architecture:** |
|
- **Class:** `UNet2DModel` |
|
- **`sample_size`**: 32 |
|
- **`in_channels`**: 3 |
|
- **`out_channels`**: 3 |
|
- **`layers_per_block`**: 2 |
|
- **`block_out_channels`**: (64, 128, 128, 256) |
|
- **`down_block_types`**: (`DownBlock2D`, `DownBlock2D`, `AttnDownBlock2D`, `DownBlock2D`) |
|
- **`up_block_types`**: (`UpBlock2D`, `AttnUpBlock2D`, `UpBlock2D`, `UpBlock2D`) |
|
|
|
## Intended Use & Limitations |
|
|
|
### Intended Use |
|
|
|
This model is primarily intended for creative applications, such as: |
|
- Generating sprites for 2D games. |
|
- Creating assets for digital art and design projects. |
|
- Providing inspiration for pixel artists. |
|
|
|
The model can be used as-is for unconditional generation or as a base model for further fine-tuning on a more specific dataset of pixel art. |
|
|
|
### Limitations |
|
|
|
- **Resolution:** The model generates images at a low resolution of **32x32 pixels**, consistent with its pixel art training data. Upscaling may be required for certain applications, which could introduce artifacts. |
|
- **Lack of Control:** This is an unconditional model, so you cannot direct the output with text prompts (e.g., "a fiery wing"). Generation is random. |
|
- **Artifacts:** Like many generative models, some outputs may contain minor visual artifacts or be less coherent than others. Running the generation process multiple times is encouraged to get a variety of high-quality results. |
|
|
|
## How to Use |
|
|
|
You can easily use this model for inference with just a few lines of code using the `diffusers` library. |
|
|
|
### 1. Installation |
|
|
|
First, make sure you have the necessary libraries installed. |
|
|
|
```bash |
|
pip install --upgrade diffusers transformers accelerate torch |
|
``` |
|
|
|
### 2. Inference Pipeline |
|
|
|
The following Python script demonstrates how to load the model from the Hugging Face Hub and generate an image. |
|
|
|
```python |
|
import torch |
|
from diffusers import DDPMPipeline |
|
from PIL import Image |
|
|
|
# For reproducibility |
|
generator = torch.manual_seed(42) |
|
|
|
# Load the pretrained pipeline from the Hub |
|
pipeline = DDPMPipeline.from_pretrained("louijiec/ddpm-pixelwing") |
|
|
|
# If you have a GPU, move the pipeline to the GPU for faster generation |
|
if torch.cuda.is_available(): |
|
pipeline = pipeline.to("cuda") |
|
|
|
print("Pipeline loaded. Starting image generation...") |
|
|
|
# Run the generation process |
|
# The pipeline returns a dataclass with the generated image |
|
result = pipeline(generator=generator, num_inference_steps=1000) |
|
image = result.images |
|
|
|
# The output is a PIL Image, which you can display or save |
|
print("Image generated successfully.") |
|
image.save("pixel_wing.png") |
|
|
|
# To generate a batch of images, you can specify `batch_size` |
|
# images = pipeline(batch_size=4, generator=generator).images |
|
# for i, img in enumerate(images): |
|
# img.save(f"pixel_wing_{i+1}.png") |
|
``` |
|
|
|
This script will generate a 32x32 pixel art wing and save it as `pixel_wing.png` in your current directory. |
|
|
|
## Training Details |
|
|
|
This model was trained from scratch. The following provides an overview for those interested in the training process or looking to reproduce it. |
|
|
|
- **Library:** The model was trained using the official `diffusers` [unconditional image generation training script](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation). |
|
|
|
- **Dataset:** The model was trained on a custom dataset named **"PixelWing"**, consisting of approximately 300 unique 32x32 pixel art images of wings. The images were created and curated specifically for this project. |
|
|
|
- **Training Procedure:** |
|
- **Image Resolution:** 32x32 |
|
- **Epochs:** 200 |
|
- **Learning Rate:** 1e-4 |
|
- **Batch Size:** 16 |
|
- **Gradient Accumulation Steps:** 1 |
|
- **Optimizer:** AdamW |
|
- **Hardware:** Trained on a single NVIDIA T4 GPU (commonly available on Google Colab). |
|
|