File size: 5,273 Bytes
5e1b2e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# Training Guide
This document provides instructions on how to run the Real-ESRGAN training script using the provided Python code. The script allows you to train a Real-ESRGAN model with customizable configurations through command-line arguments.
## Prerequisites
- Python 3.10+ installed
- Real-ESRGAN repository cloned and dependencies installed
- A valid YAML configuration file for training
- Required Python packages: `yaml`, `shutil`, and other dependencies listed in the Real-ESRGAN repository
## Script Overview
The script (`train.py`) executes the Real-ESRGAN training process by invoking the `train.py` script from the Real-ESRGAN repository. It supports various command-line arguments to customize the training process and handles moving the experiment output to a specified directory.
## Command-Line Arguments
The script accepts the following command-line arguments:
| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--config` | String | `configs/Real-ESRGAN-Anime-finetuning.yml` | Path to the configuration YAML file that defines training parameters. |
| `--launcher` | Choice (`none`, `pytorch`, `slurm`) | `none` | Job launcher for distributed training. Use `none` for single-node training, `pytorch` for PyTorch distributed, or `slurm` for SLURM-based clusters. |
| `--auto_resume` | Flag | `False` | If specified, automatically resumes training from the latest checkpoint. |
| `--debug` | Flag | `False` | If specified, enables debug mode for additional logging or debugging information. |
| `--local_rank` | Integer | `0` | Local rank for distributed training (used in multi-GPU setups). |
| `--force_yml` | List of strings | `None` | Force updates to the YAML configuration file. Example: `train:ema_decay=0.999` to override specific settings. |
| `--output_model_dir` | String | `ckpts` | Path to move the experiment directory (containing model checkpoints and logs) after training completes. |
## Usage Instructions
1. **Prepare the Environment**:
- Ensure the Real-ESRGAN repository is available in the `third_party/Real-ESRGAN` directory relative to the script.
- Install required dependencies (refer to the Real-ESRGAN repository's documentation).
2. **Create or Modify a Configuration File**:
- Prepare a YAML configuration file (e.g., `Real-ESRGAN-Anime-finetuning.yml`) specifying training parameters like dataset paths, model architecture, and hyperparameters.
- The configuration file must include a `name` field to identify the experiment.
3. **Run the Training Script**:
Use the following command to start training with default settings:
```bash
python train.py
```
To customize the training, use the command-line arguments. Examples:
- Train with a specific configuration file and enable auto-resume:
```bash
python train.py --config configs/my_config.yml --auto_resume
```
- Override YAML settings and specify an output directory:
```bash
python train.py --config configs/my_config.yml \
--output_model_dir "</path/to/output>"
```
4. **Output**:
- The script runs the Real-ESRGAN training process using the specified configuration.
- After training, the experiment directory (named after the `name` field in the YAML file) is moved from `third_party/Real-ESRGAN/experiments/` to the directory specified by `--output_model_dir`.
- If the source experiment directory does not exist, a warning is printed.
- Errors during training or directory moving are caught and reported, with the script exiting on failure.
## Example Workflow
To train a Real-ESRGAN model for anime-style image upscaling:
1. Ensure the `Real-ESRGAN-Anime-finetuning.yml` file is configured with the correct dataset paths and model settings.
2. Run the following command:
```bash
python train.py --config configs/Real-ESRGAN-Anime-finetuning.yml --output_model_dir models/anime_model --auto_resume
```
3. The training process will start, and upon completion, the experiment directory will be moved to `models/anime_model`.
## Notes
- Ensure the YAML configuration file exists at the specified path, or the script will raise a `FileNotFoundError`.
- The `PYTHONPATH` environment variable is modified to include the Real-ESRGAN directory for proper module resolution.
- If using distributed training (`--launcher pytorch` or `--launcher slurm`), ensure the environment is set up for multi-GPU or cluster-based training.
- The `--force_yml` argument allows dynamic updates to the YAML configuration without modifying the file directly.
## Troubleshooting
- **Error: Configuration file not found**:
- Verify the `--config` path is correct and the file exists.
- **Error: Training failed**:
- Check the Real-ESRGAN repository's documentation for troubleshooting training issues.
- Ensure all dependencies are installed and compatible.
- **Warning: Source directory does not exist**:
- Confirm that the experiment name in the YAML file matches the expected directory in `third_party/Real-ESRGAN/experiments/`.
- **Permission errors when moving directories**:
- Ensure the script has write permissions for the `--output_model_dir` path.
For further details on Real-ESRGAN, refer to the official repository documentation. |