Spaces:

danhtran2mind
/

Anime-Super-Resolution

Running

File size: 5,593 Bytes

5e1b2e8

# Inference Documentation

This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options.

## Prerequisites

- **Python Libraries**: Ensure the following Python packages are installed:
  - `argparse`
  - `PIL` (Pillow)
  - `numpy`
  - `torch`
  - `opencv-python` (cv2)
  - `pyyaml`
  - `huggingface_hub`
- **Model Configuration**: A YAML file specifying model details (model ID, local directory, and filename).
- **Input Image**: A valid image file (e.g., PNG, JPEG) in RGB format.
- **Hardware**: CUDA-compatible GPU (optional, for faster processing) or CPU.

## Script Overview

The script (`inference.py`) performs super-resolution on an input image using the Real-ESRGAN model. It supports:
- Downloading model weights from Hugging Face if not available locally.
- Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing).
- Saving the upscaled image to a specified output path or a default location.

## Command-Line Arguments

The script accepts the following command-line arguments:

| Argument                | Type | Required | Default | Description                                                                 |
|-------------------------|------|----------|---------|-----------------------------------------------------------------------------|
| `--input_path`          | str  | Yes      | None    | Path to the input image file (e.g., `image.png`).                           |
| `--output_path`         | str  | No       | None    | Path to save the upscaled image. If not provided, the image is returned but not saved automatically. |
| `--model_id`            | str  | Yes      | None    | Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`). |
| `--models_config_path`  | str  | Yes      | None    | Path to the YAML configuration file containing model details.                |
| `--batch_size`          | int  | No       | 1       | Batch Ascertain batch size (not used in this implementation).                    |
| `--outer_scale`        | int  | Yes      | None    | Desired final scale factor for super-resolution (e.g., 4, 8).               |
| `--inner_scale`        | int  | No       | 4       | Internal scale factor used by the model (typically 4).                      |

## Usage

1. **Prepare the Models Configuration File**:
   Create a YAML file (e.g., `models_config.yaml`) with the following structure:

   ```yaml
   - model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
     local_dir: "./weights"
     filename: "model.pth"
   ```

   This file specifies the model ID, local directory for weights, and the filename of the model checkpoint.

2. **Run the Script**:
   Use the following command to run the inference:

   ```bash
   python inference.py \
      --input_path path/to/input/image.png \
      --output_path path/to/output/image.png \
      --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
      --models_config_path path/to/models_config.yaml \
      --outer_scale 4
   ```

   Example:

   ```bash
   python inference.py \
      --input_path input.png \
      --output_path output.png \
      --model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
      --models_config_path models_config.yaml \
      --outer_scale 8
   ```

3. **Output**:
   - The script processes the input image and applies super-resolution.
   - If `--output_path` is provided, the upscaled image is saved to the specified path.
   - If `--outer_scale` differs from `--inner_scale`, the output image is resized using OpenCV's `INTER_CUBIC` (for upscaling) or `INTER_AREA` (for downscaling) interpolation.

## How It Works

1. **Model Loading**:
   - The script reads the `models_config_path` YAML file to locate the model configuration.
   - If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified `model_id` and `filename`.
   - The Real-ESRGAN model is initialized with the specified `inner_scale` and loaded with the weights.

2. **Image Processing**:
   - The input image is opened and converted to RGB format using Pillow.
   - The Real-ESRGAN model upscales the image by the `inner_scale` factor.
   - If `outer_scale` differs from `inner_scale`, the image is further resized to achieve the desired scale using OpenCV.

3. **Output Handling**:
   - The upscaled image is saved to `output_path` if provided.
   - The processed image is returned as a Pillow Image object.

## Notes

- **Device Selection**: The script automatically uses CUDA if available; otherwise, it falls back to CPU.
- **Model Weights**: Ensure the `local_dir` specified in the YAML file exists or is writable for downloading weights.
- **Outer vs. Inner Scale**:
  - `inner_scale` is the scale factor used by the Real-ESRGAN model (typically fixed at 4).
  - `outer_scale` is the final desired scale, achieved through additional resizing if necessary.
- **Batch Size**: The `--batch_size` argument is included but not used in this implementation, as the script processes one image at a time.

## Example Models Configuration File

Here is an example `models_config.yaml`:

<xaiArtifact artifact_id="0b60a214-8c91-48ed-ad50-fae3467a0508" artifact_version_id="35ae4a0a-da96-44d0-b8ed-7c1d62b59527" title="models_config.yaml" contentType="text/yaml">

```yaml
- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
  local_dir: "./weights"
  filename: "model.pth"
```