ddpm-pixelwing / README.md

Update README.md

5ac470c verified about 2 months ago

5.04 kB

	---
	license: apache-2.0
	datasets:
	- huggan/smithsonian_butterflies_subset
	tags:
	- unconditional-image-generation
	- diffusion
	- ddpm
	- pytorch
	- diffusers
	- pixel-art
	---

	# DDPM for 8-bit Pixel Art Wings (ddpm-pixelwing)

	This repository contains a Denoising Diffusion Probabilistic Model (DDPM) trained from scratch to generate 8-bit style pixel art images of wings. This model was built using the Hugging Face [`diffusers`](https://github.com/huggingface/diffusers) library.

	The model is "unconditional," meaning it generates random wing designs without any specific text or image prompt. It's a fun tool for artists, game developers, or anyone needing inspiration for pixel art sprites.

	## Model Description

	Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that learn to create data by reversing a gradual noising process. The model learns to denoise an image from pure Gaussian noise, step by step, until a clean, coherent image emerges.

	This specific model is based on the architecture proposed in the paper [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239) and is implemented as a `UNet2DModel` in the `diffusers` library. It was trained on a custom dataset of 8-bit style wing images.

	Model Architecture:
	- Class: `UNet2DModel`
	- `sample_size`: 32
	- `in_channels`: 3
	- `out_channels`: 3
	- `layers_per_block`: 2
	- `block_out_channels`: (64, 128, 128, 256)
	- `down_block_types`: (`DownBlock2D`, `DownBlock2D`, `AttnDownBlock2D`, `DownBlock2D`)
	- `up_block_types`: (`UpBlock2D`, `AttnUpBlock2D`, `UpBlock2D`, `UpBlock2D`)

	## Intended Use & Limitations

	### Intended Use

	This model is primarily intended for creative applications, such as:
	- Generating sprites for 2D games.
	- Creating assets for digital art and design projects.
	- Providing inspiration for pixel artists.

	The model can be used as-is for unconditional generation or as a base model for further fine-tuning on a more specific dataset of pixel art.

	### Limitations

	- Resolution: The model generates images at a low resolution of 32x32 pixels, consistent with its pixel art training data. Upscaling may be required for certain applications, which could introduce artifacts.
	- Lack of Control: This is an unconditional model, so you cannot direct the output with text prompts (e.g., "a fiery wing"). Generation is random.
	- Artifacts: Like many generative models, some outputs may contain minor visual artifacts or be less coherent than others. Running the generation process multiple times is encouraged to get a variety of high-quality results.

	## How to Use

	You can easily use this model for inference with just a few lines of code using the `diffusers` library.

	### 1. Installation

	First, make sure you have the necessary libraries installed.

	```bash
	pip install --upgrade diffusers transformers accelerate torch
	```

	### 2. Inference Pipeline

	The following Python script demonstrates how to load the model from the Hugging Face Hub and generate an image.

	```python
	import torch
	from diffusers import DDPMPipeline
	from PIL import Image

	# For reproducibility
	generator = torch.manual_seed(42)

	# Load the pretrained pipeline from the Hub
	pipeline = DDPMPipeline.from_pretrained("louijiec/ddpm-pixelwing")

	# If you have a GPU, move the pipeline to the GPU for faster generation
	if torch.cuda.is_available():
	pipeline = pipeline.to("cuda")

	print("Pipeline loaded. Starting image generation...")

	# Run the generation process
	# The pipeline returns a dataclass with the generated image
	result = pipeline(generator=generator, num_inference_steps=1000)
	image = result.images

	# The output is a PIL Image, which you can display or save
	print("Image generated successfully.")
	image.save("pixel_wing.png")

	# To generate a batch of images, you can specify `batch_size`
	# images = pipeline(batch_size=4, generator=generator).images
	# for i, img in enumerate(images):
	# img.save(f"pixel_wing_{i+1}.png")
	```

	This script will generate a 32x32 pixel art wing and save it as `pixel_wing.png` in your current directory.

	## Training Details

	This model was trained from scratch. The following provides an overview for those interested in the training process or looking to reproduce it.

	- Library: The model was trained using the official `diffusers` [unconditional image generation training script](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation).

	- Dataset: The model was trained on a custom dataset named "PixelWing", consisting of approximately 300 unique 32x32 pixel art images of wings. The images were created and curated specifically for this project.

	- Training Procedure:
	- Image Resolution: 32x32
	- Epochs: 200
	- Learning Rate: 1e-4
	- Batch Size: 16
	- Gradient Accumulation Steps: 1
	- Optimizer: AdamW
	- Hardware: Trained on a single NVIDIA T4 GPU (commonly available on Google Colab).