A newer version of the Gradio SDK is available:
5.38.0
Diffusion Models App
A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints.
Features
- Text-to-image generation
- Image-to-image transformation with optional prompt
- ControlNet depth-based image transformation
- Gradio UI for interactive use
- API endpoints for integration with other applications
- Configurable models via text input
- Default values for prompts, negative prompts, and models
Project Structure
main.py
- Entry point that can run both UI and APIapp.py
- Gradio UI implementationapi.py
- FastAPI server for API endpointsinference.py
- Core functionality for HF inferencecontrolnet_pipeline.py
- ControlNet depth model pipelineconfig.py
- Configuration and settingsrequirements.txt
- Dependencies
Setup & Usage
Local Development
- Clone the repository
- Create a
.env
file with your Hugging Face token (copy from.env.example
) - Install dependencies:
pip install -r requirements.txt
- Run the application:
python main.py
Hugging Face Spaces Deployment
- Never commit the
.env
file with your token to the repository! - Instead, add your HF_TOKEN as a secret in the Spaces UI:
- Go to your Space's Settings tab
- Navigate to Repository Secrets
- Add a secret named
HF_TOKEN
with your token as the value
- The application will automatically use this secret in the Spaces environment
Running Options
- Run both UI and API:
python main.py
- Run only the API:
python main.py --mode api
- Run only the UI:
python main.py --mode ui
API Endpoints
POST /text-to-image
- Generate an image from textPOST /image-to-image
- Transform an image with optional prompt
Default Values
The application includes defaults for:
- Sample prompts for text-to-image and image-to-image
- Negative prompts to exclude unwanted elements
- Pre-filled model names for both text-to-image and image-to-image
These defaults are applied to both the Gradio UI and API endpoints for consistency.
ControlNet Implementation
The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the spaces.GPU
decorator. This feature allows for:
On-device processing: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU.
Depth-based transformations: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations.
Integration with existing workflow: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox.
How it works:
- When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator.
- The depth map is then used by the ControlNet model to guide the image generation process.
- The
spaces.GPU
decorator ensures that these operations run on the GPU for optimal performance. - The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt.
The implementation uses:
stable-diffusion-v1-5
as the base modellllyasviel/sd-controlnet-depth
as the ControlNet model- The HuggingFace Transformers depth estimation pipeline
Environment Variables
HF_TOKEN
- Your Hugging Face API tokenAPI_HOST
- Host for the API server (default: 0.0.0.0)API_PORT
- Port for the API server (default: 8000)GRADIO_HOST
- Host for the Gradio UI (default: 0.0.0.0)GRADIO_PORT
- Port for the Gradio UI (default: 7860)