Inference Documentation

This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options.

Prerequisites

Python Libraries: Ensure the following Python packages are installed:
- argparse
- PIL (Pillow)
- numpy
- torch
- opencv-python (cv2)
- pyyaml
- huggingface_hub
Model Configuration: A YAML file specifying model details (model ID, local directory, and filename).
Input Image: A valid image file (e.g., PNG, JPEG) in RGB format.
Hardware: CUDA-compatible GPU (optional, for faster processing) or CPU.

Script Overview

The script (inference.py) performs super-resolution on an input image using the Real-ESRGAN model. It supports:

Downloading model weights from Hugging Face if not available locally.
Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing).
Saving the upscaled image to a specified output path or a default location.

Command-Line Arguments

The script accepts the following command-line arguments:

Argument	Type	Required	Default	Description
`--input_path`	str	Yes	None	Path to the input image file (e.g., `image.png`).
`--output_path`	str	No	None	Path to save the upscaled image. If not provided, the image is returned but not saved automatically.
`--model_id`	str	Yes	None	Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`).
`--models_config_path`	str	Yes	None	Path to the YAML configuration file containing model details.
`--batch_size`	int	No	1	Batch Ascertain batch size (not used in this implementation).
`--outer_scale`	int	Yes	None	Desired final scale factor for super-resolution (e.g., 4, 8).
`--inner_scale`	int	No	4	Internal scale factor used by the model (typically 4).

Usage

Prepare the Models Configuration File: Create a YAML file (e.g., models_config.yaml) with the following structure:
```
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
  local_dir: "./weights"
  filename: "model.pth"
```
This file specifies the model ID, local directory for weights, and the filename of the model checkpoint.

Run the Script: Use the following command to run the inference:

python inference.py \
   --input_path path/to/input/image.png \
   --output_path path/to/output/image.png \
   --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
   --models_config_path path/to/models_config.yaml \
   --outer_scale 4

Example:

python inference.py \
   --input_path input.png \
   --output_path output.png \
   --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
   --models_config_path models_config.yaml \
   --outer_scale 8

Output:
- The script processes the input image and applies super-resolution.
- If --output_path is provided, the upscaled image is saved to the specified path.
- If --outer_scale differs from --inner_scale, the output image is resized using OpenCV's INTER_CUBIC (for upscaling) or INTER_AREA (for downscaling) interpolation.

How It Works

Model Loading:
- The script reads the models_config_path YAML file to locate the model configuration.
- If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified model_id and filename.
- The Real-ESRGAN model is initialized with the specified inner_scale and loaded with the weights.
Image Processing:
- The input image is opened and converted to RGB format using Pillow.
- The Real-ESRGAN model upscales the image by the inner_scale factor.
- If outer_scale differs from inner_scale, the image is further resized to achieve the desired scale using OpenCV.
Output Handling:
- The upscaled image is saved to output_path if provided.
- The processed image is returned as a Pillow Image object.

Notes

Device Selection: The script automatically uses CUDA if available; otherwise, it falls back to CPU.
Model Weights: Ensure the local_dir specified in the YAML file exists or is writable for downloading weights.
Outer vs. Inner Scale:
- inner_scale is the scale factor used by the Real-ESRGAN model (typically fixed at 4).
- outer_scale is the final desired scale, achieved through additional resizing if necessary.
Batch Size: The --batch_size argument is included but not used in this implementation, as the script processes one image at a time.

Example Models Configuration File

Here is an example models_config.yaml:

- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
  local_dir: "./weights"
  filename: "model.pth"