# Inference Documentation This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options. ## Prerequisites - **Python Libraries**: Ensure the following Python packages are installed: - `argparse` - `PIL` (Pillow) - `numpy` - `torch` - `opencv-python` (cv2) - `pyyaml` - `huggingface_hub` - **Model Configuration**: A YAML file specifying model details (model ID, local directory, and filename). - **Input Image**: A valid image file (e.g., PNG, JPEG) in RGB format. - **Hardware**: CUDA-compatible GPU (optional, for faster processing) or CPU. ## Script Overview The script (`inference.py`) performs super-resolution on an input image using the Real-ESRGAN model. It supports: - Downloading model weights from Hugging Face if not available locally. - Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing). - Saving the upscaled image to a specified output path or a default location. ## Command-Line Arguments The script accepts the following command-line arguments: | Argument | Type | Required | Default | Description | |-------------------------|------|----------|---------|-----------------------------------------------------------------------------| | `--input_path` | str | Yes | None | Path to the input image file (e.g., `image.png`). | | `--output_path` | str | No | None | Path to save the upscaled image. If not provided, the image is returned but not saved automatically. | | `--model_id` | str | Yes | None | Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`). | | `--models_config_path` | str | Yes | None | Path to the YAML configuration file containing model details. | | `--batch_size` | int | No | 1 | Batch Ascertain batch size (not used in this implementation). | | `--outer_scale` | int | Yes | None | Desired final scale factor for super-resolution (e.g., 4, 8). | | `--inner_scale` | int | No | 4 | Internal scale factor used by the model (typically 4). | ## Usage 1. **Prepare the Models Configuration File**: Create a YAML file (e.g., `models_config.yaml`) with the following structure: ```yaml - model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning" local_dir: "./weights" filename: "model.pth" ``` This file specifies the model ID, local directory for weights, and the filename of the model checkpoint. 2. **Run the Script**: Use the following command to run the inference: ```bash python inference.py \ --input_path path/to/input/image.png \ --output_path path/to/output/image.png \ --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ --models_config_path path/to/models_config.yaml \ --outer_scale 4 ``` Example: ```bash python inference.py \ --input_path input.png \ --output_path output.png \ --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ --models_config_path models_config.yaml \ --outer_scale 8 ``` 3. **Output**: - The script processes the input image and applies super-resolution. - If `--output_path` is provided, the upscaled image is saved to the specified path. - If `--outer_scale` differs from `--inner_scale`, the output image is resized using OpenCV's `INTER_CUBIC` (for upscaling) or `INTER_AREA` (for downscaling) interpolation. ## How It Works 1. **Model Loading**: - The script reads the `models_config_path` YAML file to locate the model configuration. - If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified `model_id` and `filename`. - The Real-ESRGAN model is initialized with the specified `inner_scale` and loaded with the weights. 2. **Image Processing**: - The input image is opened and converted to RGB format using Pillow. - The Real-ESRGAN model upscales the image by the `inner_scale` factor. - If `outer_scale` differs from `inner_scale`, the image is further resized to achieve the desired scale using OpenCV. 3. **Output Handling**: - The upscaled image is saved to `output_path` if provided. - The processed image is returned as a Pillow Image object. ## Notes - **Device Selection**: The script automatically uses CUDA if available; otherwise, it falls back to CPU. - **Model Weights**: Ensure the `local_dir` specified in the YAML file exists or is writable for downloading weights. - **Outer vs. Inner Scale**: - `inner_scale` is the scale factor used by the Real-ESRGAN model (typically fixed at 4). - `outer_scale` is the final desired scale, achieved through additional resizing if necessary. - **Batch Size**: The `--batch_size` argument is included but not used in this implementation, as the script processes one image at a time. ## Example Models Configuration File Here is an example `models_config.yaml`: ```yaml - model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning" local_dir: "./weights" filename: "model.pth" ```