A newer version of the Gradio SDK is available:
5.45.0
Inference Documentation
This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options.
Prerequisites
- Python Libraries: Ensure the following Python packages are installed:
argparse
PIL
(Pillow)numpy
torch
opencv-python
(cv2)pyyaml
huggingface_hub
- Model Configuration: A YAML file specifying model details (model ID, local directory, and filename).
- Input Image: A valid image file (e.g., PNG, JPEG) in RGB format.
- Hardware: CUDA-compatible GPU (optional, for faster processing) or CPU.
Script Overview
The script (inference.py
) performs super-resolution on an input image using the Real-ESRGAN model. It supports:
- Downloading model weights from Hugging Face if not available locally.
- Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing).
- Saving the upscaled image to a specified output path or a default location.
Command-Line Arguments
The script accepts the following command-line arguments:
Argument | Type | Required | Default | Description |
---|---|---|---|---|
--input_path |
str | Yes | None | Path to the input image file (e.g., image.png ). |
--output_path |
str | No | None | Path to save the upscaled image. If not provided, the image is returned but not saved automatically. |
--model_id |
str | Yes | None | Model ID for the Real-ESRGAN model (e.g., danhtran2mind/Real-ESRGAN-Anime-finetuning ). |
--models_config_path |
str | Yes | None | Path to the YAML configuration file containing model details. |
--batch_size |
int | No | 1 | Batch Ascertain batch size (not used in this implementation). |
--outer_scale |
int | Yes | None | Desired final scale factor for super-resolution (e.g., 4, 8). |
--inner_scale |
int | No | 4 | Internal scale factor used by the model (typically 4). |
Usage
Prepare the Models Configuration File: Create a YAML file (e.g.,
models_config.yaml
) with the following structure:- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning" local_dir: "./weights" filename: "model.pth"
This file specifies the model ID, local directory for weights, and the filename of the model checkpoint.
Run the Script: Use the following command to run the inference:
python inference.py \ --input_path path/to/input/image.png \ --output_path path/to/output/image.png \ --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ --models_config_path path/to/models_config.yaml \ --outer_scale 4
Example:
python inference.py \ --input_path input.png \ --output_path output.png \ --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ --models_config_path models_config.yaml \ --outer_scale 8
Output:
- The script processes the input image and applies super-resolution.
- If
--output_path
is provided, the upscaled image is saved to the specified path. - If
--outer_scale
differs from--inner_scale
, the output image is resized using OpenCV'sINTER_CUBIC
(for upscaling) orINTER_AREA
(for downscaling) interpolation.
How It Works
Model Loading:
- The script reads the
models_config_path
YAML file to locate the model configuration. - If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified
model_id
andfilename
. - The Real-ESRGAN model is initialized with the specified
inner_scale
and loaded with the weights.
- The script reads the
Image Processing:
- The input image is opened and converted to RGB format using Pillow.
- The Real-ESRGAN model upscales the image by the
inner_scale
factor. - If
outer_scale
differs frominner_scale
, the image is further resized to achieve the desired scale using OpenCV.
Output Handling:
- The upscaled image is saved to
output_path
if provided. - The processed image is returned as a Pillow Image object.
- The upscaled image is saved to
Notes
- Device Selection: The script automatically uses CUDA if available; otherwise, it falls back to CPU.
- Model Weights: Ensure the
local_dir
specified in the YAML file exists or is writable for downloading weights. - Outer vs. Inner Scale:
inner_scale
is the scale factor used by the Real-ESRGAN model (typically fixed at 4).outer_scale
is the final desired scale, achieved through additional resizing if necessary.
- Batch Size: The
--batch_size
argument is included but not used in this implementation, as the script processes one image at a time.
Example Models Configuration File
Here is an example models_config.yaml
:
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
local_dir: "./weights"
filename: "model.pth"