imagencpu / Project.md
ovedrive's picture
merge controlnet
0443b19

A newer version of the Gradio SDK is available: 5.38.0

Upgrade

Diffusion Models App

A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints.

Features

  • Text-to-image generation
  • Image-to-image transformation with optional prompt
  • ControlNet depth-based image transformation
  • Gradio UI for interactive use
  • API endpoints for integration with other applications
  • Configurable models via text input
  • Default values for prompts, negative prompts, and models

Project Structure

  • main.py - Entry point that can run both UI and API
  • app.py - Gradio UI implementation
  • api.py - FastAPI server for API endpoints
  • inference.py - Core functionality for HF inference
  • controlnet_pipeline.py - ControlNet depth model pipeline
  • config.py - Configuration and settings
  • requirements.txt - Dependencies

Setup & Usage

Local Development

  1. Clone the repository
  2. Create a .env file with your Hugging Face token (copy from .env.example)
  3. Install dependencies: pip install -r requirements.txt
  4. Run the application: python main.py

Hugging Face Spaces Deployment

  1. Never commit the .env file with your token to the repository!
  2. Instead, add your HF_TOKEN as a secret in the Spaces UI:
    • Go to your Space's Settings tab
    • Navigate to Repository Secrets
    • Add a secret named HF_TOKEN with your token as the value
  3. The application will automatically use this secret in the Spaces environment

Running Options

  • Run both UI and API: python main.py
  • Run only the API: python main.py --mode api
  • Run only the UI: python main.py --mode ui

API Endpoints

  • POST /text-to-image - Generate an image from text
  • POST /image-to-image - Transform an image with optional prompt

Default Values

The application includes defaults for:

  • Sample prompts for text-to-image and image-to-image
  • Negative prompts to exclude unwanted elements
  • Pre-filled model names for both text-to-image and image-to-image

These defaults are applied to both the Gradio UI and API endpoints for consistency.

ControlNet Implementation

The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the spaces.GPU decorator. This feature allows for:

  1. On-device processing: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU.

  2. Depth-based transformations: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations.

  3. Integration with existing workflow: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox.

How it works:

  1. When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator.
  2. The depth map is then used by the ControlNet model to guide the image generation process.
  3. The spaces.GPU decorator ensures that these operations run on the GPU for optimal performance.
  4. The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt.

The implementation uses:

  • stable-diffusion-v1-5 as the base model
  • lllyasviel/sd-controlnet-depth as the ControlNet model
  • The HuggingFace Transformers depth estimation pipeline

Environment Variables

  • HF_TOKEN - Your Hugging Face API token
  • API_HOST - Host for the API server (default: 0.0.0.0)
  • API_PORT - Port for the API server (default: 8000)
  • GRADIO_HOST - Host for the Gradio UI (default: 0.0.0.0)
  • GRADIO_PORT - Port for the Gradio UI (default: 7860)