DDPM for 8-bit Pixel Art Wings (ddpm-pixelwing)

This repository contains a Denoising Diffusion Probabilistic Model (DDPM) trained from scratch to generate 8-bit style pixel art images of wings. This model was built using the Hugging Face diffusers library.

The model is "unconditional," meaning it generates random wing designs without any specific text or image prompt. It's a fun tool for artists, game developers, or anyone needing inspiration for pixel art sprites.

Model Description

Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that learn to create data by reversing a gradual noising process. The model learns to denoise an image from pure Gaussian noise, step by step, until a clean, coherent image emerges.

This specific model is based on the architecture proposed in the paper Denoising Diffusion Probabilistic Models and is implemented as a UNet2DModel in the diffusers library. It was trained on a custom dataset of 8-bit style wing images.

Model Architecture:

Class: UNet2DModel
sample_size: 32
in_channels: 3
out_channels: 3
layers_per_block: 2
block_out_channels: (64, 128, 128, 256)
down_block_types: (DownBlock2D, DownBlock2D, AttnDownBlock2D, DownBlock2D)
up_block_types: (UpBlock2D, AttnUpBlock2D, UpBlock2D, UpBlock2D)

Intended Use & Limitations

Intended Use

This model is primarily intended for creative applications, such as:

Generating sprites for 2D games.
Creating assets for digital art and design projects.
Providing inspiration for pixel artists.

The model can be used as-is for unconditional generation or as a base model for further fine-tuning on a more specific dataset of pixel art.

Limitations

Resolution: The model generates images at a low resolution of 32x32 pixels, consistent with its pixel art training data. Upscaling may be required for certain applications, which could introduce artifacts.
Lack of Control: This is an unconditional model, so you cannot direct the output with text prompts (e.g., "a fiery wing"). Generation is random.
Artifacts: Like many generative models, some outputs may contain minor visual artifacts or be less coherent than others. Running the generation process multiple times is encouraged to get a variety of high-quality results.

How to Use

You can easily use this model for inference with just a few lines of code using the diffusers library.

1. Installation

First, make sure you have the necessary libraries installed.

pip install --upgrade diffusers transformers accelerate torch

2. Inference Pipeline

The following Python script demonstrates how to load the model from the Hugging Face Hub and generate an image.

import torch
from diffusers import DDPMPipeline
from PIL import Image

# For reproducibility
generator = torch.manual_seed(42)

# Load the pretrained pipeline from the Hub
pipeline = DDPMPipeline.from_pretrained("louijiec/ddpm-pixelwing")

# If you have a GPU, move the pipeline to the GPU for faster generation
if torch.cuda.is_available():
    pipeline = pipeline.to("cuda")

print("Pipeline loaded. Starting image generation...")

# Run the generation process
# The pipeline returns a dataclass with the generated image
result = pipeline(generator=generator, num_inference_steps=1000)
image = result.images

# The output is a PIL Image, which you can display or save
print("Image generated successfully.")
image.save("pixel_wing.png")

# To generate a batch of images, you can specify `batch_size`
# images = pipeline(batch_size=4, generator=generator).images
# for i, img in enumerate(images):
#     img.save(f"pixel_wing_{i+1}.png")

This script will generate a 32x32 pixel art wing and save it as pixel_wing.png in your current directory.

Training Details

This model was trained from scratch. The following provides an overview for those interested in the training process or looking to reproduce it.

Library: The model was trained using the official diffusers unconditional image generation training script.
Dataset: The model was trained on a custom dataset named "PixelWing", consisting of approximately 300 unique 32x32 pixel art images of wings. The images were created and curated specifically for this project.
Training Procedure:
- Image Resolution: 32x32
- Epochs: 200
- Learning Rate: 1e-4
- Batch Size: 16
- Gradient Accumulation Steps: 1
- Optimizer: AdamW
- Hardware: Trained on a single NVIDIA T4 GPU (commonly available on Google Colab).

louijiec
/

ddpm-pixelwing