Update README.md
Browse files
README.md
CHANGED
|
@@ -35,94 +35,11 @@ The model was trained on several datasets:
|
|
| 35 |
|
| 36 |
The combined dataset consists of **45,726** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **182,904** samples. This augmented dataset was then split into **146,323 samples for training** and **36,581 samples for validation**.
|
| 37 |
|
| 38 |
-
## Project Structure
|
| 39 |
-
|
| 40 |
-
```
|
| 41 |
-
image_orientation_detector/
|
| 42 |
-
├───.gitignore
|
| 43 |
-
├───config.py # Main configuration file for paths, model, and hyperparameters
|
| 44 |
-
├───convert_to_onnx.py # Script to convert the PyTorch model to ONNX format
|
| 45 |
-
├───predict.py # Script for running inference on new images
|
| 46 |
-
├───README.md # This file
|
| 47 |
-
├───requirements.txt # Python dependencies
|
| 48 |
-
├───train.py # Main script for training the model
|
| 49 |
-
├───data/
|
| 50 |
-
│ ├───upright_images/ # Directory for correctly oriented images
|
| 51 |
-
│ └───cache/ # Directory for cached, pre-rotated images (auto-generated)
|
| 52 |
-
├───models/
|
| 53 |
-
│ └───best_model.pth # The best trained model weights
|
| 54 |
-
└───src/
|
| 55 |
-
├───caching.py # Logic for creating the image cache
|
| 56 |
-
├───dataset.py # PyTorch Dataset classes
|
| 57 |
-
├───model.py # Model definition (EfficientNetV2)
|
| 58 |
-
└───utils.py # Utility functions (e.g., device setup, transforms)
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
## Usage
|
| 62 |
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
You can download the pre-trained model (`orientation_model_xx.pth`) and its ONNX version (`orientation_model_xx.onnx`) from the [GitHub Releases](https://github.com/your-repo/your-project/releases) page.
|
| 66 |
-
|
| 67 |
-
Install the required Python packages using the `requirements.txt` file:
|
| 68 |
-
|
| 69 |
-
```bash
|
| 70 |
-
pip install -r requirements.txt
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
### Prediction
|
| 74 |
-
|
| 75 |
-
To predict the orientation of an image or a directory of images, there's a `predict.py` script.
|
| 76 |
-
|
| 77 |
-
- **Predict a single image:**
|
| 78 |
-
|
| 79 |
-
```bash
|
| 80 |
-
python predict.py --input_path /path/to/image.jpg
|
| 81 |
-
```
|
| 82 |
-
- **Predict all images in a directory:**
|
| 83 |
-
|
| 84 |
-
```bash
|
| 85 |
-
python predict.py --input_path /path/to/directory/
|
| 86 |
-
```
|
| 87 |
-
|
| 88 |
-
The script will output the predicted orientation for each image.
|
| 89 |
-
|
| 90 |
-
### ONNX Export and Prediction
|
| 91 |
-
|
| 92 |
-
This project also includes exporting the trained PyTorch model to the ONNX (Open Neural Network Exchange) format. This allows for faster inference, especially on hardware that doesn't have PyTorch installed.
|
| 93 |
-
|
| 94 |
-
To convert a `.pth` model to `.onnx`, provide the path to the model file:
|
| 95 |
-
|
| 96 |
-
```bash
|
| 97 |
-
python convert_to_onnx.py path/to/model.pth
|
| 98 |
-
```
|
| 99 |
-
|
| 100 |
-
This will create a `model.onnx` file in the same directory.
|
| 101 |
-
|
| 102 |
-
To predict image orientation using the ONNX model:
|
| 103 |
-
|
| 104 |
-
- **Predict a single image:**
|
| 105 |
-
|
| 106 |
-
```bash
|
| 107 |
-
python predict_onnx.py --input_path /path/to/image.jpg
|
| 108 |
-
```
|
| 109 |
-
- **Predict all images in a directory:**
|
| 110 |
-
|
| 111 |
-
```bash
|
| 112 |
-
python predict_onnx.py --input_path /path/to/directory/
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
#### ONNX GPU Acceleration (Optional)
|
| 116 |
-
|
| 117 |
-
For even better performance on NVIDIA GPUs, you can install the GPU-enabled version of ONNX Runtime.
|
| 118 |
-
|
| 119 |
-
```bash
|
| 120 |
-
pip install onnxruntime-gpu
|
| 121 |
-
```
|
| 122 |
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
#### Performance Comparison (PyTorch vs. ONNX)
|
| 126 |
|
| 127 |
For a dataset of 5055 images, the performance on a RTX 4080 running in **single-thread** was:
|
| 128 |
|
|
@@ -131,67 +48,6 @@ For a dataset of 5055 images, the performance on a RTX 4080 running in **single-
|
|
| 131 |
|
| 132 |
This demonstrates a significant performance gain of approximately **55.2%** when using the ONNX model for inference.
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
This model learns to identify image orientation by training on a dataset of images that you provide. For the model to learn effectively, provide images that are correctly oriented.
|
| 137 |
-
|
| 138 |
-
**Place Images in the `data/upright_images` directory**: All images must be placed in the `data/upright_images` directory. The training script will automatically generate rotated versions (90°, 180°, 270°) of these images and cache them for efficient training.
|
| 139 |
-
|
| 140 |
-
The directory structure should look like this:
|
| 141 |
-
|
| 142 |
-
```
|
| 143 |
-
data/
|
| 144 |
-
└───upright_images/
|
| 145 |
-
├───image1.jpg
|
| 146 |
-
├───image2.png
|
| 147 |
-
└───...
|
| 148 |
-
```
|
| 149 |
-
|
| 150 |
-
### Configure the Training
|
| 151 |
-
|
| 152 |
-
All training parameters are centralized in the `config.py` file. Before starting the training, review and adjust the settings to match the hardware and dataset.
|
| 153 |
-
|
| 154 |
-
Key configuration options in `config.py`:
|
| 155 |
-
|
| 156 |
-
- **Paths and Caching**:
|
| 157 |
-
|
| 158 |
-
- `TRAIN_IMAGES_PATH`: Path to upright images. Defaults to `data/upright_images`.
|
| 159 |
-
- `CACHE_PATH`: Directory where rotated images will be cached. Defaults to `data/cache`.
|
| 160 |
-
- `USE_CACHE`: Set to `True` to use the cache on subsequent runs, significantly speeding up data loading but takes a lot of disk space.
|
| 161 |
-
- **Model and Training Hyperparameters**:
|
| 162 |
-
|
| 163 |
-
- `MODEL_NAME`: The name of the model architecture to use (e.g., `EfficientNetV2S`).
|
| 164 |
-
- `IMAGE_SIZE`: The resolution to which images will be resized (e.g., `224` for 224x224 pixels).
|
| 165 |
-
- `BATCH_SIZE`: Number of images to process in each batch. Adjust based on GPU's VRAM.
|
| 166 |
-
- `NUM_EPOCHS`: The total number of times the model will iterate over the entire dataset.
|
| 167 |
-
- `LEARNING_RATE`: The initial learning rate for the optimizer.
|
| 168 |
-
|
| 169 |
-
### Start Training
|
| 170 |
-
|
| 171 |
-
Once all data is in place and the configuration is set, start training the model by running the `train.py` script:
|
| 172 |
-
|
| 173 |
-
```bash
|
| 174 |
-
python train.py
|
| 175 |
-
```
|
| 176 |
-
|
| 177 |
-
- **First Run**: The first time the script runs, it will preprocess and cache the dataset. This may take a while depending on the size of the dataset.
|
| 178 |
-
- **Subsequent Runs**: Later runs will be much faster as they will use the cached data.
|
| 179 |
-
- **Monitoring**: Use TensorBoard to monitor training progress by running `tensorboard --logdir=runs`.
|
| 180 |
-
|
| 181 |
-
### Monitoring with TensorBoard
|
| 182 |
-
|
| 183 |
-
The training script is integrated with TensorBoard to help visualize metrics and understand the model's performance. During training, logs are saved in the `runs/` directory.
|
| 184 |
-
|
| 185 |
-
To launch TensorBoard, run the command:
|
| 186 |
-
|
| 187 |
-
```bash
|
| 188 |
-
tensorboard --logdir=runs
|
| 189 |
-
```
|
| 190 |
-
|
| 191 |
-
This will start a web server, open the provided URL (usually `http://localhost:6006`) in the browser to view the dashboard.
|
| 192 |
-
|
| 193 |
-
In TensorBoard, you can track:
|
| 194 |
|
| 195 |
-
-
|
| 196 |
-
- **Loss:** `Loss/train` and `Loss/validation`
|
| 197 |
-
- **Learning Rate:** `Hyperparameters/learning_rate` to see how it changes over epochs.
|
|
|
|
| 35 |
|
| 36 |
The combined dataset consists of **45,726** unique images. Each image is augmented by being rotated in four ways (0°, 90°, 180°, 270°), creating a total of **182,904** samples. This augmented dataset was then split into **146,323 samples for training** and **36,581 samples for validation**.
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## Usage
|
| 39 |
|
| 40 |
+
For detailed usage instructions, including how to run predictions, export to ONNX, and train the model, please refer to the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
## Performance Comparison (PyTorch vs. ONNX)
|
|
|
|
|
|
|
| 43 |
|
| 44 |
For a dataset of 5055 images, the performance on a RTX 4080 running in **single-thread** was:
|
| 45 |
|
|
|
|
| 48 |
|
| 49 |
This demonstrates a significant performance gain of approximately **55.2%** when using the ONNX model for inference.
|
| 50 |
|
| 51 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
For more in-depth information about the project, including the full source code, training scripts, and detailed documentation, please visit the [GitHub repository](https://github.com/duartebarbosadev/deep-image-orientation-detection).
|
|
|
|
|
|