File size: 5,593 Bytes
5e1b2e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# Inference Documentation
This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options.
## Prerequisites
- **Python Libraries**: Ensure the following Python packages are installed:
- `argparse`
- `PIL` (Pillow)
- `numpy`
- `torch`
- `opencv-python` (cv2)
- `pyyaml`
- `huggingface_hub`
- **Model Configuration**: A YAML file specifying model details (model ID, local directory, and filename).
- **Input Image**: A valid image file (e.g., PNG, JPEG) in RGB format.
- **Hardware**: CUDA-compatible GPU (optional, for faster processing) or CPU.
## Script Overview
The script (`inference.py`) performs super-resolution on an input image using the Real-ESRGAN model. It supports:
- Downloading model weights from Hugging Face if not available locally.
- Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing).
- Saving the upscaled image to a specified output path or a default location.
## Command-Line Arguments
The script accepts the following command-line arguments:
| Argument | Type | Required | Default | Description |
|-------------------------|------|----------|---------|-----------------------------------------------------------------------------|
| `--input_path` | str | Yes | None | Path to the input image file (e.g., `image.png`). |
| `--output_path` | str | No | None | Path to save the upscaled image. If not provided, the image is returned but not saved automatically. |
| `--model_id` | str | Yes | None | Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`). |
| `--models_config_path` | str | Yes | None | Path to the YAML configuration file containing model details. |
| `--batch_size` | int | No | 1 | Batch Ascertain batch size (not used in this implementation). |
| `--outer_scale` | int | Yes | None | Desired final scale factor for super-resolution (e.g., 4, 8). |
| `--inner_scale` | int | No | 4 | Internal scale factor used by the model (typically 4). |
## Usage
1. **Prepare the Models Configuration File**:
Create a YAML file (e.g., `models_config.yaml`) with the following structure:
```yaml
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
local_dir: "./weights"
filename: "model.pth"
```
This file specifies the model ID, local directory for weights, and the filename of the model checkpoint.
2. **Run the Script**:
Use the following command to run the inference:
```bash
python inference.py \
--input_path path/to/input/image.png \
--output_path path/to/output/image.png \
--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
--models_config_path path/to/models_config.yaml \
--outer_scale 4
```
Example:
```bash
python inference.py \
--input_path input.png \
--output_path output.png \
--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
--models_config_path models_config.yaml \
--outer_scale 8
```
3. **Output**:
- The script processes the input image and applies super-resolution.
- If `--output_path` is provided, the upscaled image is saved to the specified path.
- If `--outer_scale` differs from `--inner_scale`, the output image is resized using OpenCV's `INTER_CUBIC` (for upscaling) or `INTER_AREA` (for downscaling) interpolation.
## How It Works
1. **Model Loading**:
- The script reads the `models_config_path` YAML file to locate the model configuration.
- If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified `model_id` and `filename`.
- The Real-ESRGAN model is initialized with the specified `inner_scale` and loaded with the weights.
2. **Image Processing**:
- The input image is opened and converted to RGB format using Pillow.
- The Real-ESRGAN model upscales the image by the `inner_scale` factor.
- If `outer_scale` differs from `inner_scale`, the image is further resized to achieve the desired scale using OpenCV.
3. **Output Handling**:
- The upscaled image is saved to `output_path` if provided.
- The processed image is returned as a Pillow Image object.
## Notes
- **Device Selection**: The script automatically uses CUDA if available; otherwise, it falls back to CPU.
- **Model Weights**: Ensure the `local_dir` specified in the YAML file exists or is writable for downloading weights.
- **Outer vs. Inner Scale**:
- `inner_scale` is the scale factor used by the Real-ESRGAN model (typically fixed at 4).
- `outer_scale` is the final desired scale, achieved through additional resizing if necessary.
- **Batch Size**: The `--batch_size` argument is included but not used in this implementation, as the script processes one image at a time.
## Example Models Configuration File
Here is an example `models_config.yaml`:
<xaiArtifact artifact_id="0b60a214-8c91-48ed-ad50-fae3467a0508" artifact_version_id="35ae4a0a-da96-44d0-b8ed-7c1d62b59527" title="models_config.yaml" contentType="text/yaml">
```yaml
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
local_dir: "./weights"
filename: "model.pth"
``` |