|
# Inference Documentation |
|
|
|
This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options. |
|
|
|
## Prerequisites |
|
|
|
- **Python Libraries**: Ensure the following Python packages are installed: |
|
- `argparse` |
|
- `PIL` (Pillow) |
|
- `numpy` |
|
- `torch` |
|
- `opencv-python` (cv2) |
|
- `pyyaml` |
|
- `huggingface_hub` |
|
- **Model Configuration**: A YAML file specifying model details (model ID, local directory, and filename). |
|
- **Input Image**: A valid image file (e.g., PNG, JPEG) in RGB format. |
|
- **Hardware**: CUDA-compatible GPU (optional, for faster processing) or CPU. |
|
|
|
## Script Overview |
|
|
|
The script (`inference.py`) performs super-resolution on an input image using the Real-ESRGAN model. It supports: |
|
- Downloading model weights from Hugging Face if not available locally. |
|
- Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing). |
|
- Saving the upscaled image to a specified output path or a default location. |
|
|
|
## Command-Line Arguments |
|
|
|
The script accepts the following command-line arguments: |
|
|
|
| Argument | Type | Required | Default | Description | |
|
|-------------------------|------|----------|---------|-----------------------------------------------------------------------------| |
|
| `--input_path` | str | Yes | None | Path to the input image file (e.g., `image.png`). | |
|
| `--output_path` | str | No | None | Path to save the upscaled image. If not provided, the image is returned but not saved automatically. | |
|
| `--model_id` | str | Yes | None | Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`). | |
|
| `--models_config_path` | str | Yes | None | Path to the YAML configuration file containing model details. | |
|
| `--batch_size` | int | No | 1 | Batch Ascertain batch size (not used in this implementation). | |
|
| `--outer_scale` | int | Yes | None | Desired final scale factor for super-resolution (e.g., 4, 8). | |
|
| `--inner_scale` | int | No | 4 | Internal scale factor used by the model (typically 4). | |
|
|
|
## Usage |
|
|
|
1. **Prepare the Models Configuration File**: |
|
Create a YAML file (e.g., `models_config.yaml`) with the following structure: |
|
|
|
```yaml |
|
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning" |
|
local_dir: "./weights" |
|
filename: "model.pth" |
|
``` |
|
|
|
This file specifies the model ID, local directory for weights, and the filename of the model checkpoint. |
|
|
|
2. **Run the Script**: |
|
Use the following command to run the inference: |
|
|
|
```bash |
|
python inference.py \ |
|
--input_path path/to/input/image.png \ |
|
--output_path path/to/output/image.png \ |
|
--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ |
|
--models_config_path path/to/models_config.yaml \ |
|
--outer_scale 4 |
|
``` |
|
|
|
Example: |
|
|
|
```bash |
|
python inference.py \ |
|
--input_path input.png \ |
|
--output_path output.png \ |
|
--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \ |
|
--models_config_path models_config.yaml \ |
|
--outer_scale 8 |
|
``` |
|
|
|
3. **Output**: |
|
- The script processes the input image and applies super-resolution. |
|
- If `--output_path` is provided, the upscaled image is saved to the specified path. |
|
- If `--outer_scale` differs from `--inner_scale`, the output image is resized using OpenCV's `INTER_CUBIC` (for upscaling) or `INTER_AREA` (for downscaling) interpolation. |
|
|
|
## How It Works |
|
|
|
1. **Model Loading**: |
|
- The script reads the `models_config_path` YAML file to locate the model configuration. |
|
- If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified `model_id` and `filename`. |
|
- The Real-ESRGAN model is initialized with the specified `inner_scale` and loaded with the weights. |
|
|
|
2. **Image Processing**: |
|
- The input image is opened and converted to RGB format using Pillow. |
|
- The Real-ESRGAN model upscales the image by the `inner_scale` factor. |
|
- If `outer_scale` differs from `inner_scale`, the image is further resized to achieve the desired scale using OpenCV. |
|
|
|
3. **Output Handling**: |
|
- The upscaled image is saved to `output_path` if provided. |
|
- The processed image is returned as a Pillow Image object. |
|
|
|
## Notes |
|
|
|
- **Device Selection**: The script automatically uses CUDA if available; otherwise, it falls back to CPU. |
|
- **Model Weights**: Ensure the `local_dir` specified in the YAML file exists or is writable for downloading weights. |
|
- **Outer vs. Inner Scale**: |
|
- `inner_scale` is the scale factor used by the Real-ESRGAN model (typically fixed at 4). |
|
- `outer_scale` is the final desired scale, achieved through additional resizing if necessary. |
|
- **Batch Size**: The `--batch_size` argument is included but not used in this implementation, as the script processes one image at a time. |
|
|
|
## Example Models Configuration File |
|
|
|
Here is an example `models_config.yaml`: |
|
|
|
<xaiArtifact artifact_id="0b60a214-8c91-48ed-ad50-fae3467a0508" artifact_version_id="35ae4a0a-da96-44d0-b8ed-7c1d62b59527" title="models_config.yaml" contentType="text/yaml"> |
|
|
|
```yaml |
|
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning" |
|
local_dir: "./weights" |
|
filename: "model.pth" |
|
``` |