Spaces:

danhtran2mind
/

Anime-Super-Resolution

Running

App Files Files Community

Anime-Super-Resolution / docs /inference /inference_doc.md

danhtran2mind

Upload 82 files

5e1b2e8 verified about 1 month ago

preview code

raw

history blame contribute delete

5.59 kB

	# Inference Documentation

	This document provides instructions for using the Real-ESRGAN inference script to perform super-resolution on anime images. The script uses a pre-trained Real-ESRGAN model to upscale images, with configurable input and output options.

	## Prerequisites

	- Python Libraries: Ensure the following Python packages are installed:
	- `argparse`
	- `PIL` (Pillow)
	- `numpy`
	- `torch`
	- `opencv-python` (cv2)
	- `pyyaml`
	- `huggingface_hub`
	- Model Configuration: A YAML file specifying model details (model ID, local directory, and filename).
	- Input Image: A valid image file (e.g., PNG, JPEG) in RGB format.
	- Hardware: CUDA-compatible GPU (optional, for faster processing) or CPU.

	## Script Overview

	The script (`inference.py`) performs super-resolution on an input image using the Real-ESRGAN model. It supports:
	- Downloading model weights from Hugging Face if not available locally.
	- Upscaling images using an inner scale (model-specific) and an optional outer scale (post-processing resizing).
	- Saving the upscaled image to a specified output path or a default location.

	## Command-Line Arguments

	The script accepts the following command-line arguments:

	\| Argument \| Type \| Required \| Default \| Description \|
	\|-------------------------\|------\|----------\|---------\|-----------------------------------------------------------------------------\|
	\| `--input_path` \| str \| Yes \| None \| Path to the input image file (e.g., `image.png`). \|
	\| `--output_path` \| str \| No \| None \| Path to save the upscaled image. If not provided, the image is returned but not saved automatically. \|
	\| `--model_id` \| str \| Yes \| None \| Model ID for the Real-ESRGAN model (e.g., `danhtran2mind/Real-ESRGAN-Anime-finetuning`). \|
	\| `--models_config_path` \| str \| Yes \| None \| Path to the YAML configuration file containing model details. \|
	\| `--batch_size` \| int \| No \| 1 \| Batch Ascertain batch size (not used in this implementation). \|
	\| `--outer_scale` \| int \| Yes \| None \| Desired final scale factor for super-resolution (e.g., 4, 8). \|
	\| `--inner_scale` \| int \| No \| 4 \| Internal scale factor used by the model (typically 4). \|

	## Usage

	1. Prepare the Models Configuration File:
	Create a YAML file (e.g., `models_config.yaml`) with the following structure:

	```yaml
	- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
	local_dir: "./weights"
	filename: "model.pth"
	```

	This file specifies the model ID, local directory for weights, and the filename of the model checkpoint.

	2. Run the Script:
	Use the following command to run the inference:

	```bash
	python inference.py \
	--input_path path/to/input/image.png \
	--output_path path/to/output/image.png \
	--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
	--models_config_path path/to/models_config.yaml \
	--outer_scale 4
	```

	Example:

	```bash
	python inference.py \
	--input_path input.png \
	--output_path output.png \
	--model_id danhtran2mind/Real-ESRGAN-Anime-finetuning \
	--models_config_path models_config.yaml \
	--outer_scale 8
	```

	3. Output:
	- The script processes the input image and applies super-resolution.
	- If `--output_path` is provided, the upscaled image is saved to the specified path.
	- If `--outer_scale` differs from `--inner_scale`, the output image is resized using OpenCV's `INTER_CUBIC` (for upscaling) or `INTER_AREA` (for downscaling) interpolation.

	## How It Works

	1. Model Loading:
	- The script reads the `models_config_path` YAML file to locate the model configuration.
	- If the model weights are not found locally, they are downloaded from the Hugging Face Hub using the specified `model_id` and `filename`.
	- The Real-ESRGAN model is initialized with the specified `inner_scale` and loaded with the weights.

	2. Image Processing:
	- The input image is opened and converted to RGB format using Pillow.
	- The Real-ESRGAN model upscales the image by the `inner_scale` factor.
	- If `outer_scale` differs from `inner_scale`, the image is further resized to achieve the desired scale using OpenCV.

	3. Output Handling:
	- The upscaled image is saved to `output_path` if provided.
	- The processed image is returned as a Pillow Image object.

	## Notes

	- Device Selection: The script automatically uses CUDA if available; otherwise, it falls back to CPU.
	- Model Weights: Ensure the `local_dir` specified in the YAML file exists or is writable for downloading weights.
	- Outer vs. Inner Scale:
	- `inner_scale` is the scale factor used by the Real-ESRGAN model (typically fixed at 4).
	- `outer_scale` is the final desired scale, achieved through additional resizing if necessary.
	- Batch Size: The `--batch_size` argument is included but not used in this implementation, as the script processes one image at a time.

	## Example Models Configuration File

	Here is an example `models_config.yaml`:

	<xaiArtifact artifact_id="0b60a214-8c91-48ed-ad50-fae3467a0508" artifact_version_id="35ae4a0a-da96-44d0-b8ed-7c1d62b59527" title="models_config.yaml" contentType="text/yaml">

	```yaml
	- model_id: "danhtran2mind/Real-ESRGAN-Anime-finetuning"
	local_dir: "./weights"
	filename: "model.pth"
	```