Diffusion Models App

A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints.

Features

Text-to-image generation
Image-to-image transformation with optional prompt
ControlNet depth-based image transformation
Gradio UI for interactive use
API endpoints for integration with other applications
Configurable models via text input
Default values for prompts, negative prompts, and models

Project Structure

main.py - Entry point that can run both UI and API
app.py - Gradio UI implementation
api.py - FastAPI server for API endpoints
inference.py - Core functionality for HF inference
controlnet_pipeline.py - ControlNet depth model pipeline
config.py - Configuration and settings
requirements.txt - Dependencies

Setup & Usage

Local Development

Clone the repository
Create a .env file with your Hugging Face token (copy from .env.example)
Install dependencies: pip install -r requirements.txt
Run the application: python main.py

Hugging Face Spaces Deployment

Never commit the .env file with your token to the repository!
Instead, add your HF_TOKEN as a secret in the Spaces UI:
- Go to your Space's Settings tab
- Navigate to Repository Secrets
- Add a secret named HF_TOKEN with your token as the value
The application will automatically use this secret in the Spaces environment

Running Options

Run both UI and API: python main.py
Run only the API: python main.py --mode api
Run only the UI: python main.py --mode ui

API Endpoints

POST /text-to-image - Generate an image from text
POST /image-to-image - Transform an image with optional prompt

Default Values

The application includes defaults for:

Sample prompts for text-to-image and image-to-image
Negative prompts to exclude unwanted elements
Pre-filled model names for both text-to-image and image-to-image

These defaults are applied to both the Gradio UI and API endpoints for consistency.

ControlNet Implementation

The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the spaces.GPU decorator. This feature allows for:

On-device processing: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU.
Depth-based transformations: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations.
Integration with existing workflow: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox.

How it works:

When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator.
The depth map is then used by the ControlNet model to guide the image generation process.
The spaces.GPU decorator ensures that these operations run on the GPU for optimal performance.
The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt.

The implementation uses:

stable-diffusion-v1-5 as the base model
lllyasviel/sd-controlnet-depth as the ControlNet model
The HuggingFace Transformers depth estimation pipeline

Environment Variables

HF_TOKEN - Your Hugging Face API token
API_HOST - Host for the API server (default: 0.0.0.0)
API_PORT - Port for the API server (default: 8000)
GRADIO_HOST - Host for the Gradio UI (default: 0.0.0.0)
GRADIO_PORT - Port for the Gradio UI (default: 7860)