|
# Diffusion Models App |
|
|
|
A Python application that uses Hugging Face inference endpoints and on-device models for text-to-image and image-to-image generation with a Gradio UI and API endpoints. |
|
|
|
## Features |
|
|
|
- Text-to-image generation |
|
- Image-to-image transformation with optional prompt |
|
- ControlNet depth-based image transformation |
|
- Gradio UI for interactive use |
|
- API endpoints for integration with other applications |
|
- Configurable models via text input |
|
- Default values for prompts, negative prompts, and models |
|
|
|
## Project Structure |
|
|
|
- `main.py` - Entry point that can run both UI and API |
|
- `app.py` - Gradio UI implementation |
|
- `api.py` - FastAPI server for API endpoints |
|
- `inference.py` - Core functionality for HF inference |
|
- `controlnet_pipeline.py` - ControlNet depth model pipeline |
|
- `config.py` - Configuration and settings |
|
- `requirements.txt` - Dependencies |
|
|
|
## Setup & Usage |
|
|
|
### Local Development |
|
1. Clone the repository |
|
2. Create a `.env` file with your Hugging Face token (copy from `.env.example`) |
|
3. Install dependencies: `pip install -r requirements.txt` |
|
4. Run the application: `python main.py` |
|
|
|
### Hugging Face Spaces Deployment |
|
1. Never commit the `.env` file with your token to the repository! |
|
2. Instead, add your HF_TOKEN as a secret in the Spaces UI: |
|
- Go to your Space's Settings tab |
|
- Navigate to Repository Secrets |
|
- Add a secret named `HF_TOKEN` with your token as the value |
|
3. The application will automatically use this secret in the Spaces environment |
|
|
|
## Running Options |
|
|
|
- Run both UI and API: `python main.py` |
|
- Run only the API: `python main.py --mode api` |
|
- Run only the UI: `python main.py --mode ui` |
|
|
|
## API Endpoints |
|
|
|
- `POST /text-to-image` - Generate an image from text |
|
- `POST /image-to-image` - Transform an image with optional prompt |
|
|
|
## Default Values |
|
|
|
The application includes defaults for: |
|
- Sample prompts for text-to-image and image-to-image |
|
- Negative prompts to exclude unwanted elements |
|
- Pre-filled model names for both text-to-image and image-to-image |
|
|
|
These defaults are applied to both the Gradio UI and API endpoints for consistency. |
|
|
|
## ControlNet Implementation |
|
|
|
The application now supports running a ControlNet depth model directly on the Hugging Face Spaces GPU using the `spaces.GPU` decorator. This feature allows for: |
|
|
|
1. **On-device processing**: Instead of relying solely on remote inference endpoints, the app can now perform image transformations using the local GPU. |
|
|
|
2. **Depth-based transformations**: The ControlNet implementation extracts depth information from the input image, allowing for more structure-preserving transformations. |
|
|
|
3. **Integration with existing workflow**: The ControlNet option is seamlessly integrated into the image-to-image tab via a simple checkbox. |
|
|
|
### How it works: |
|
|
|
1. When a user uploads an image and enables the ControlNet option, the app processes the image through a depth estimator. |
|
2. The depth map is then used by the ControlNet model to guide the image generation process. |
|
3. The `spaces.GPU` decorator ensures that these operations run on the GPU for optimal performance. |
|
4. The resulting image maintains the spatial structure of the original while applying the creative transformation specified in the prompt. |
|
|
|
The implementation uses: |
|
- `stable-diffusion-v1-5` as the base model |
|
- `lllyasviel/sd-controlnet-depth` as the ControlNet model |
|
- The HuggingFace Transformers depth estimation pipeline |
|
|
|
## Environment Variables |
|
|
|
- `HF_TOKEN` - Your Hugging Face API token |
|
- `API_HOST` - Host for the API server (default: 0.0.0.0) |
|
- `API_PORT` - Port for the API server (default: 8000) |
|
- `GRADIO_HOST` - Host for the Gradio UI (default: 0.0.0.0) |
|
- `GRADIO_PORT` - Port for the Gradio UI (default: 7860) |
|
|