text_inversion / README.md
anirudh97's picture
Upload 2 files
a21e693 verified

Stable Diffusion Text Inversion with Loss Functions

This repository contains a Gradio web application that provides an intuitive interface for generating images using Stable Diffusion with textual inversion and guided loss functions.

Overview

The application allows users to explore the capabilities of Stable Diffusion by:

  • Generating images from text prompts
  • Using textual inversion concepts
  • Applying various loss functions to guide the diffusion process
  • Generating multiple images with different seeds

!alt text

Features

Core Functionality

  • Text-to-Image Generation: Create detailed images from descriptive text prompts
  • Textual Inversion: Apply learned concepts to your generations
  • Loss Function Guidance: Shape image generation with specialized loss functions:
    • Blue Loss: Emphasizes blue tones in the generated images
    • Elastic Loss: Creates distortion effects by applying elastic transformations
    • Symmetry Loss: Encourages symmetrical image generation
    • Saturation Loss: Enhances color saturation in the output
  • Multi-Seed Generation: Create multiple variations of an image with different seeds

Installation

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended)

Setup

  1. Clone this repository:
git clone https://github.com/yourusername/stable-diffusion-text-inversion.git
cd stable-diffusion-text-inversion
  1. Install dependencies:
pip install torch diffusers transformers tqdm torchvision matplotlib gradio
  1. Run the application:
python gradio_app.py
  1. Open the provided URL (typically http://localhost:7860) in your browser.

Understanding the Technology

Stable Diffusion

Stable Diffusion is a latent text-to-image diffusion model developed by Stability AI. It works by:

  1. Encoding text: Converting text prompts into embeddings that the model can understand
  2. Starting with noise: Beginning with random noise in a latent space
  3. Iterative denoising: Gradually removing noise while being guided by the text embeddings
  4. Decoding to image: Converting the final latent representation to a pixel-based image

The model operates in a compressed latent space (64x64x4) rather than pixel space (512x512x3), allowing for efficient generation of high-resolution images with limited computational resources.

Textual Inversion

Textual Inversion is a technique that allows Stable Diffusion to learn new concepts from just a few example images. Key aspects include:

  • Custom Concepts: Learn new visual concepts not present in the model's training data
  • Few-Shot Learning: Typically requires only 3-5 examples of a concept
  • Token Optimization: Creates a new "pseudo-word" embedding that represents the concept
  • Seamless Integration: Once learned, concepts can be used in prompts just like regular words

In this application, we load several pre-trained textual inversion concepts from the SD concepts library:

  • Rimworld art style
  • HK Golden Lantern
  • Phoenix-01
  • Fractal Flame
  • Scarlet Witch

Guided Loss Functions

This application introduces an innovative approach by applying custom loss functions during the diffusion process:

  1. How it works: During generation, we periodically decode the current latent representation, apply a loss function to the decoded image, and backpropagate that loss to adjust the latents.

  2. Types of Loss Functions:

    • Blue Loss: Encourages pixels to have higher values in the blue channel
    • Elastic Loss: Minimizes difference between the image and an elastically transformed version
    • Symmetry Loss: Minimizes difference between the image and its horizontal mirror
    • Saturation Loss: Pushes the image toward higher color saturation
  3. Impact: These loss functions can dramatically alter the aesthetic qualities of the generated images, creating effects that would be difficult to achieve through prompt engineering alone.

Usage Examples

Basic Image Generation

  1. Enter a prompt in the text box (e.g., "A majestic castle on a floating island with waterfalls")
  2. Set Loss Type to "N/A" and uncheck "Apply Loss Function"
  3. Enter a seed value (e.g., "42")
  4. Click "Generate Images"

Applying Loss Functions

  1. Enter your prompt
  2. Select a Loss Type (e.g., "symmetry")
  3. Check "Apply Loss Function"
  4. Enter a seed value
  5. Click "Generate Images"

Batch Generation

  1. Enter your prompt
  2. Select desired loss settings
  3. Enter multiple comma-separated seeds (e.g., "42, 100, 500")
  4. Click "Generate Images" to generate a grid of variations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Stability AI for developing Stable Diffusion
  • Hugging Face for the Diffusers library
  • Gradio for the web interface framework
  • The creators of the textual inversion concepts used in this project