Stable Diffusion Text Inversion with Loss Functions

This repository contains a Gradio web application that provides an intuitive interface for generating images using Stable Diffusion with textual inversion and guided loss functions.

Overview

The application allows users to explore the capabilities of Stable Diffusion by:

Generating images from text prompts
Using textual inversion concepts
Applying various loss functions to guide the diffusion process
Generating multiple images with different seeds

Features

Core Functionality

Text-to-Image Generation: Create detailed images from descriptive text prompts
Textual Inversion: Apply learned concepts to your generations
Loss Function Guidance: Shape image generation with specialized loss functions:
- Blue Loss: Emphasizes blue tones in the generated images
- Elastic Loss: Creates distortion effects by applying elastic transformations
- Symmetry Loss: Encourages symmetrical image generation
- Saturation Loss: Enhances color saturation in the output
Multi-Seed Generation: Create multiple variations of an image with different seeds

Installation

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended)

Setup

Clone this repository:

git clone https://github.com/yourusername/stable-diffusion-text-inversion.git
cd stable-diffusion-text-inversion

Install dependencies:

pip install torch diffusers transformers tqdm torchvision matplotlib gradio

Run the application:

python gradio_app.py

Open the provided URL (typically http://localhost:7860) in your browser.

Understanding the Technology

Stable Diffusion

Stable Diffusion is a latent text-to-image diffusion model developed by Stability AI. It works by:

Encoding text: Converting text prompts into embeddings that the model can understand
Starting with noise: Beginning with random noise in a latent space
Iterative denoising: Gradually removing noise while being guided by the text embeddings
Decoding to image: Converting the final latent representation to a pixel-based image

The model operates in a compressed latent space (64x64x4) rather than pixel space (512x512x3), allowing for efficient generation of high-resolution images with limited computational resources.

Textual Inversion

Textual Inversion is a technique that allows Stable Diffusion to learn new concepts from just a few example images. Key aspects include:

Custom Concepts: Learn new visual concepts not present in the model's training data
Few-Shot Learning: Typically requires only 3-5 examples of a concept
Token Optimization: Creates a new "pseudo-word" embedding that represents the concept
Seamless Integration: Once learned, concepts can be used in prompts just like regular words

In this application, we load several pre-trained textual inversion concepts from the SD concepts library:

Rimworld art style
HK Golden Lantern
Phoenix-01
Fractal Flame
Scarlet Witch

Guided Loss Functions

This application introduces an innovative approach by applying custom loss functions during the diffusion process:

How it works: During generation, we periodically decode the current latent representation, apply a loss function to the decoded image, and backpropagate that loss to adjust the latents.
Types of Loss Functions:
- Blue Loss: Encourages pixels to have higher values in the blue channel
- Elastic Loss: Minimizes difference between the image and an elastically transformed version
- Symmetry Loss: Minimizes difference between the image and its horizontal mirror
- Saturation Loss: Pushes the image toward higher color saturation
Impact: These loss functions can dramatically alter the aesthetic qualities of the generated images, creating effects that would be difficult to achieve through prompt engineering alone.

Usage Examples

Basic Image Generation

Enter a prompt in the text box (e.g., "A majestic castle on a floating island with waterfalls")
Set Loss Type to "N/A" and uncheck "Apply Loss Function"
Enter a seed value (e.g., "42")
Click "Generate Images"

Applying Loss Functions

Enter your prompt
Select a Loss Type (e.g., "symmetry")
Check "Apply Loss Function"
Enter a seed value
Click "Generate Images"

Batch Generation

Enter your prompt
Select desired loss settings
Enter multiple comma-separated seeds (e.g., "42, 100, 500")
Click "Generate Images" to generate a grid of variations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Stability AI for developing Stable Diffusion
Hugging Face for the Diffusers library
Gradio for the web interface framework
The creators of the textual inversion concepts used in this project