imagencpu

Sleeping

File size: 3,727 Bytes

# Diffusion Models App

A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints.

## Features

- Text-to-image generation
- Image-to-image transformation with optional prompt
- ControlNet depth-based image transformation
- Gradio UI for interactive use
- API endpoints for integration with other applications
- Configurable models via text input
- Default values for prompts, negative prompts, and models

## Project Structure

- `main.py` - Entry point that can run both UI and API
- `app.py` - Gradio UI implementation
- `api.py` - FastAPI server for API endpoints
- `inference.py` - Core functionality for HF inference
- `controlnet_pipeline.py` - ControlNet depth model pipeline
- `config.py` - Configuration and settings
- `requirements.txt` - Dependencies

## Setup & Usage

### Local Development
1. Clone the repository
2. Create a `.env` file with your Hugging Face token (copy from `.env.example`)
3. Install dependencies: `pip install -r requirements.txt`
4. Run the application: `python main.py`

### Hugging Face Spaces Deployment
1. Never commit the `.env` file with your token to the repository!
2. Instead, add your HF_TOKEN as a secret in the Spaces UI:
   - Go to your Space's Settings tab
   - Navigate to Repository Secrets
   - Add a secret named `HF_TOKEN` with your token as the value
3. The application will automatically use this secret in the Spaces environment

## Running Options

- Run both UI and API: `python main.py`
- Run only the API: `python main.py --mode api`
- Run only the UI: `python main.py --mode ui`

## API Endpoints

- `POST /text-to-image` - Generate an image from text
- `POST /image-to-image` - Transform an image with optional prompt

## Default Values

The application includes defaults for:
- Sample prompts for text-to-image and image-to-image
- Negative prompts to exclude unwanted elements
- Pre-filled model names for both text-to-image and image-to-image

These defaults are applied to both the Gradio UI and API endpoints for consistency.

## ControlNet Implementation

The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the `spaces.GPU` decorator. This feature allows for:

1. **On-device processing**: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU.

2. **Depth-based transformations**: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations.

3. **Integration with existing workflow**: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox.

### How it works:

1. When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator.
2. The depth map is then used by the ControlNet model to guide the image generation process.
3. The `spaces.GPU` decorator ensures that these operations run on the GPU for optimal performance.
4. The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt.

The implementation uses:
- `stable-diffusion-v1-5` as the base model
- `lllyasviel/sd-controlnet-depth` as the ControlNet model
- The HuggingFace Transformers depth estimation pipeline

## Environment Variables

- `HF_TOKEN` - Your Hugging Face API token
- `API_HOST` - Host for the API server (default: 0.0.0.0)
- `API_PORT` - Port for the API server (default: 8000)
- `GRADIO_HOST` - Host for the Gradio UI (default: 0.0.0.0)
- `GRADIO_PORT` - Port for the Gradio UI (default: 7860)