textual_inversion / README.md
anirudh97's picture
final
ad1a39b
---
title: Stable Diffusion Text Inversion with Loss Functions
emoji: 🖼️
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 3.36.1
app_file: app.py
pinned: false
---
# Stable Diffusion Text Inversion with Loss Functions
This repository contains a Gradio web application that provides an intuitive interface for generating images using Stable Diffusion with textual inversion and guided loss functions.
## Overview
The application allows users to explore the capabilities of Stable Diffusion by:
- Generating images from text prompts
- Using textual inversion concepts
- Applying various loss functions to guide the diffusion process
- Generating multiple images with different seeds
## Features
### Core Functionality
- **Text-to-Image Generation**: Create detailed images from descriptive text prompts
- **Textual Inversion**: Apply learned concepts to your generations
- **Loss Function Guidance**: Shape image generation with specialized loss functions:
- **Blue Loss**: Emphasizes blue tones in the generated images
- **Elastic Loss**: Creates distortion effects by applying elastic transformations
- **Symmetry Loss**: Encourages symmetrical image generation
- **Saturation Loss**: Enhances color saturation in the output
- **Multi-Seed Generation**: Create multiple variations of an image with different seeds
## Installation
### Prerequisites
- Python 3.8+
- CUDA-capable GPU (recommended)
### Setup
1. Clone this repository:
```bash
git clone https://github.com/yourusername/stable-diffusion-text-inversion.git
cd stable-diffusion-text-inversion
```
2. Install dependencies:
```bash
pip install torch diffusers transformers tqdm torchvision matplotlib gradio
```
3. Run the application:
```bash
python app.py
```
4. Open the provided URL (typically http://localhost:7860) in your browser.
## Understanding the Technology
### Stable Diffusion
Stable Diffusion is a latent text-to-image diffusion model developed by Stability AI. It works by:
1. **Encoding text**: Converting text prompts into embeddings that the model can understand
2. **Starting with noise**: Beginning with random noise in a latent space
3. **Iterative denoising**: Gradually removing noise while being guided by the text embeddings
4. **Decoding to image**: Converting the final latent representation to a pixel-based image
The model operates in a compressed latent space (64x64x4) rather than pixel space (512x512x3), allowing for efficient generation of high-resolution images with limited computational resources.
### Textual Inversion
Textual Inversion is a technique that allows Stable Diffusion to learn new concepts from just a few example images. Key aspects include:
- **Custom Concepts**: Learn new visual concepts not present in the model's training data
- **Few-Shot Learning**: Typically requires only 3-5 examples of a concept
- **Token Optimization**: Creates a new "pseudo-word" embedding that represents the concept
- **Seamless Integration**: Once learned, concepts can be used in prompts just like regular words
In this application, we load several pre-trained textual inversion concepts from the SD concepts library:
- Rimworld art style
- HK Golden Lantern
- Phoenix-01
- Fractal Flame
- Scarlet Witch
### Guided Loss Functions
This application introduces an innovative approach by applying custom loss functions during the diffusion process:
1. **How it works**: During generation, we periodically decode the current latent representation, apply a loss function to the decoded image, and backpropagate that loss to adjust the latents.
2. **Types of Loss Functions**:
- **Blue Loss**: Encourages pixels to have higher values in the blue channel
- **Elastic Loss**: Minimizes difference between the image and an elastically transformed version
- **Symmetry Loss**: Minimizes difference between the image and its horizontal mirror
- **Saturation Loss**: Pushes the image toward higher color saturation
3. **Impact**: These loss functions can dramatically alter the aesthetic qualities of the generated images, creating effects that would be difficult to achieve through prompt engineering alone.
## Usage Examples
### Basic Image Generation
1. Enter a prompt in the text box (e.g., "A majestic castle on a floating island with waterfalls")
2. Set Loss Type to "N/A" and uncheck "Apply Loss Function"
3. Enter a seed value (e.g., "42")
4. Click "Generate Images"
### Applying Loss Functions
1. Enter your prompt
2. Select a Loss Type (e.g., "symmetry")
3. Check "Apply Loss Function"
4. Enter a seed value
5. Click "Generate Images"
### Batch Generation
1. Enter your prompt
2. Select desired loss settings
3. Enter multiple comma-separated seeds (e.g., "42, 100, 500")
4. Click "Generate Images" to generate a grid of variations
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- [Stability AI](https://stability.ai/) for developing Stable Diffusion
- [Hugging Face](https://huggingface.co/) for the Diffusers library
- [Gradio](https://gradio.app/) for the web interface framework
- The creators of the textual inversion concepts used in this project