|
# Train predefined models on standard datasets |
|
|
|
MMDetection also provides out-of-the-box tools for training detection models. |
|
This section will show how to train _predefined_ models (under [configs](../../../configs)) on standard datasets i.e. COCO. |
|
|
|
## Prepare datasets |
|
|
|
Training requires preparing datasets too. See section [Prepare datasets](#prepare-datasets) above for details. |
|
|
|
**Note**: |
|
Currently, the config files under `configs/cityscapes` use COCO pre-trained weights to initialize. |
|
You could download the existing models in advance if the network connection is unavailable or slow. Otherwise, it would cause errors at the beginning of training. |
|
|
|
## Learning rate auto scaling |
|
|
|
**Important**: The default learning rate in config files is for 8 GPUs and 2 sample per GPU (batch size = 8 * 2 = 16). And it had been set to `auto_scale_lr.base_batch_size` in `config/_base_/schedules/schedule_1x.py`. Learning rate will be automatically scaled base on this value when the batch size is `16`. Meanwhile, in order not to affect other codebase which based on mmdet, the flag `auto_scale_lr.enable` is set to `False` by default. |
|
|
|
If you want to enable this feature, you need to add argument `--auto-scale-lr`. And you need to check the config name which you want to use before you process the command, because the config name indicates the default batch size. |
|
By default, it is `8 x 2 = 16 batch size`, like `faster_rcnn_r50_caffe_fpn_90k_coco.py` or `pisa_faster_rcnn_x101_32x4d_fpn_1x_coco.py`. In other cases, you will see the config file name have `_NxM_` in dictating, like `cornernet_hourglass104_mstest_32x3_210e_coco.py` which batch size is `32 x 3 = 96`, or `scnet_x101_64x4d_fpn_8x1_20e_coco.py` which batch size is `8 x 1 = 8`. |
|
|
|
**Please remember to check the bottom of the specific config file you want to use, it will have `auto_scale_lr.base_batch_size` if the batch size is not `16`. If you can't find those values, check the config file which in `_base_=[xxx]` and you will find it. Please do not modify its values if you want to automatically scale the LR.** |
|
|
|
The basic usage of learning rate auto scaling is as follows. |
|
|
|
```shell |
|
python tools/train.py \ |
|
${CONFIG_FILE} \ |
|
--auto-scale-lr \ |
|
[optional arguments] |
|
``` |
|
|
|
If you enabled this feature, the learning rate will be automatically scaled according to the number of GPUs on the machine and the batch size of training. See [linear scaling rule](https://arxiv.org/abs/1706.02677) for details. For example, If there are 4 GPUs and 2 pictures on each GPU, `lr = 0.01`, then if there are 16 GPUs and 4 pictures on each GPU, it will automatically scale to `lr = 0.08`. |
|
|
|
If you don't want to use it, you need to calculate the learning rate according to the [linear scaling rule](https://arxiv.org/abs/1706.02677) manually then change `optimizer.lr` in specific config file. |
|
|
|
## Training on a single GPU |
|
|
|
We provide `tools/train.py` to launch training jobs on a single GPU. |
|
The basic usage is as follows. |
|
|
|
```shell |
|
python tools/train.py \ |
|
${CONFIG_FILE} \ |
|
[optional arguments] |
|
``` |
|
|
|
During training, log files and checkpoints will be saved to the working directory, which is specified by `work_dir` in the config file or via CLI argument `--work-dir`. |
|
|
|
By default, the model is evaluated on the validation set every epoch, the evaluation interval can be specified in the config file as shown below. |
|
|
|
```python |
|
# evaluate the model every 12 epochs. |
|
train_cfg = dict(val_interval=12) |
|
``` |
|
|
|
This tool accepts several optional arguments, including: |
|
|
|
- `--work-dir ${WORK_DIR}`: Override the working directory. |
|
- `--auto-resume`: resume from the latest checkpoint in the work_dir automatically. |
|
- `--cfg-options 'Key=value'`: Overrides other settings in the used config. |
|
|
|
## Training on CPU |
|
|
|
The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process. |
|
|
|
```shell |
|
export CUDA_VISIBLE_DEVICES=-1 |
|
``` |
|
|
|
And then run the script [above](#training-on-a-single-GPU). |
|
|
|
**Note**: |
|
|
|
We do not recommend users to use the CPU for training because it is too slow. We support this feature to allow users to debug on machines without GPU for convenience. |
|
|
|
## Training on multiple GPUs |
|
|
|
We provide `tools/dist_train.sh` to launch training on multiple GPUs. |
|
The basic usage is as follows. |
|
|
|
```shell |
|
bash ./tools/dist_train.sh \ |
|
${CONFIG_FILE} \ |
|
${GPU_NUM} \ |
|
[optional arguments] |
|
``` |
|
|
|
Optional arguments remain the same as stated [above](#training-on-a-single-GPU). |
|
|
|
### Launch multiple jobs simultaneously |
|
|
|
If you would like to launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, |
|
you need to specify different ports (29500 by default) for each job to avoid communication conflict. |
|
|
|
If you use `dist_train.sh` to launch training jobs, you can set the port in the commands. |
|
|
|
```shell |
|
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 |
|
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 |
|
``` |
|
|
|
## Train with multiple machines |
|
|
|
If you launch with multiple machines simply connected with ethernet, you can simply run the following commands: |
|
|
|
On the first machine: |
|
|
|
```shell |
|
NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS |
|
``` |
|
|
|
On the second machine: |
|
|
|
```shell |
|
NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS |
|
``` |
|
|
|
Usually, it is slow if you do not have high-speed networking like InfiniBand. |
|
|
|
## Manage jobs with Slurm |
|
|
|
[Slurm](https://slurm.schedmd.com/) is a good job scheduling system for computing clusters. |
|
On a cluster managed by Slurm, you can use `slurm_train.sh` to spawn training jobs. It supports both single-node and multi-node training. |
|
|
|
The basic usage is as follows. |
|
|
|
```shell |
|
[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} |
|
``` |
|
|
|
Below is an example of using 16 GPUs to train Mask R-CNN on a Slurm partition named _dev_, and set the work-dir to some shared file systems. |
|
|
|
```shell |
|
GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask-rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x |
|
``` |
|
|
|
You can check [the source code](../../../tools/slurm_train.sh) to review full arguments and environment variables. |
|
|
|
When using Slurm, the port option needs to be set in one of the following ways: |
|
|
|
1. Set the port through `--options`. This is more recommended since it does not change the original configs. |
|
|
|
```shell |
|
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --options 'dist_params.port=29500' |
|
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --options 'dist_params.port=29501' |
|
``` |
|
|
|
2. Modify the config files to set different communication ports. |
|
|
|
In `config1.py`, set |
|
|
|
```python |
|
dist_params = dict(backend='nccl', port=29500) |
|
``` |
|
|
|
In `config2.py`, set |
|
|
|
```python |
|
dist_params = dict(backend='nccl', port=29501) |
|
``` |
|
|
|
Then you can launch two jobs with `config1.py` and `config2.py`. |
|
|
|
```shell |
|
CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} |
|
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} |
|
``` |
|
|
|
# Train with customized datasets |
|
|
|
In this part, you will know how to train predefined models with customized datasets and then test it. We use the [balloon dataset](https://github.com/matterport/Mask_RCNN/tree/master/samples/balloon) as an example to describe the whole process. |
|
|
|
The basic steps are as below: |
|
|
|
1. Prepare the customized dataset |
|
2. Prepare a config |
|
3. Train, test, and infer models on the customized dataset. |
|
|
|
## Prepare the customized dataset |
|
|
|
There are three ways to support a new dataset in MMDetection: |
|
|
|
1. Reorganize the dataset into COCO format. |
|
2. Reorganize the dataset into a middle format. |
|
3. Implement a new dataset. |
|
|
|
Usually, we recommend using the first two methods which are usually easier than the third. |
|
|
|
In this note, we give an example of converting the data into COCO format. |
|
|
|
**Note**: Datasets and metrics have been decoupled except CityScapes since MMDetection 3.0. Therefore, users can use any kind of evaluation metrics for any format of datasets during validation. For example: evaluate on COCO dataset with VOC metric, or evaluate on OpenImages dataset with both VOC and COCO metrics. |
|
|
|
### COCO annotation format |
|
|
|
The necessary keys of COCO format for instance segmentation are as below, for the complete details, please refer [here](https://cocodataset.org/#format-data). |
|
|
|
```json |
|
{ |
|
"images": [image], |
|
"annotations": [annotation], |
|
"categories": [category] |
|
} |
|
|
|
image = { |
|
"id": int, |
|
"width": int, |
|
"height": int, |
|
"file_name": str, |
|
} |
|
|
|
annotation = { |
|
"id": int, |
|
"image_id": int, |
|
"category_id": int, |
|
"segmentation": RLE or [polygon], |
|
"area": float, |
|
"bbox": [x,y,width,height], # (x, y) are the coordinates of the upper left corner of the bbox |
|
"iscrowd": 0 or 1, |
|
} |
|
|
|
categories = [{ |
|
"id": int, |
|
"name": str, |
|
"supercategory": str, |
|
}] |
|
``` |
|
|
|
Assume we use the balloon dataset. |
|
After downloading the data, we need to implement a function to convert the annotation format into the COCO format. Then we can use implemented `CocoDataset` to load the data and perform training and evaluation. |
|
|
|
If you take a look at the dataset, you will find the dataset format is as below: |
|
|
|
```json |
|
{'base64_img_data': '', |
|
'file_attributes': {}, |
|
'filename': '34020010494_e5cb88e1c4_k.jpg', |
|
'fileref': '', |
|
'regions': {'0': {'region_attributes': {}, |
|
'shape_attributes': {'all_points_x': [1020, |
|
1000, |
|
994, |
|
1003, |
|
1023, |
|
1050, |
|
1089, |
|
1134, |
|
1190, |
|
1265, |
|
1321, |
|
1361, |
|
1403, |
|
1428, |
|
1442, |
|
1445, |
|
1441, |
|
1427, |
|
1400, |
|
1361, |
|
1316, |
|
1269, |
|
1228, |
|
1198, |
|
1207, |
|
1210, |
|
1190, |
|
1177, |
|
1172, |
|
1174, |
|
1170, |
|
1153, |
|
1127, |
|
1104, |
|
1061, |
|
1032, |
|
1020], |
|
'all_points_y': [963, |
|
899, |
|
841, |
|
787, |
|
738, |
|
700, |
|
663, |
|
638, |
|
621, |
|
619, |
|
643, |
|
672, |
|
720, |
|
765, |
|
800, |
|
860, |
|
896, |
|
942, |
|
990, |
|
1035, |
|
1079, |
|
1112, |
|
1129, |
|
1134, |
|
1144, |
|
1153, |
|
1166, |
|
1166, |
|
1150, |
|
1136, |
|
1129, |
|
1122, |
|
1112, |
|
1084, |
|
1037, |
|
989, |
|
963], |
|
'name': 'polygon'}}}, |
|
'size': 1115004} |
|
``` |
|
|
|
The annotation is a JSON file where each key indicates an image's all annotations. |
|
The code to convert the balloon dataset into coco format is as below. |
|
|
|
```python |
|
import os.path as osp |
|
|
|
import mmcv |
|
|
|
from mmengine.fileio import dump, load |
|
from mmengine.utils import track_iter_progress |
|
|
|
|
|
def convert_balloon_to_coco(ann_file, out_file, image_prefix): |
|
data_infos = load(ann_file) |
|
|
|
annotations = [] |
|
images = [] |
|
obj_count = 0 |
|
for idx, v in enumerate(track_iter_progress(data_infos.values())): |
|
filename = v['filename'] |
|
img_path = osp.join(image_prefix, filename) |
|
height, width = mmcv.imread(img_path).shape[:2] |
|
|
|
images.append( |
|
dict(id=idx, file_name=filename, height=height, width=width)) |
|
|
|
for _, obj in v['regions'].items(): |
|
assert not obj['region_attributes'] |
|
obj = obj['shape_attributes'] |
|
px = obj['all_points_x'] |
|
py = obj['all_points_y'] |
|
poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)] |
|
poly = [p for x in poly for p in x] |
|
|
|
x_min, y_min, x_max, y_max = (min(px), min(py), max(px), max(py)) |
|
|
|
data_anno = dict( |
|
image_id=idx, |
|
id=obj_count, |
|
category_id=0, |
|
bbox=[x_min, y_min, x_max - x_min, y_max - y_min], |
|
area=(x_max - x_min) * (y_max - y_min), |
|
segmentation=[poly], |
|
iscrowd=0) |
|
annotations.append(data_anno) |
|
obj_count += 1 |
|
|
|
coco_format_json = dict( |
|
images=images, |
|
annotations=annotations, |
|
categories=[{ |
|
'id': 0, |
|
'name': 'balloon' |
|
}]) |
|
dump(coco_format_json, out_file) |
|
|
|
|
|
if __name__ == '__main__': |
|
convert_balloon_to_coco(ann_file='data/balloon/train/via_region_data.json', |
|
out_file='data/balloon/train/annotation_coco.json', |
|
image_prefix='data/balloon/train') |
|
convert_balloon_to_coco(ann_file='data/balloon/val/via_region_data.json', |
|
out_file='data/balloon/val/annotation_coco.json', |
|
image_prefix='data/balloon/val') |
|
|
|
``` |
|
|
|
Using the function above, users can successfully convert the annotation file into json format, then we can use `CocoDataset` to train and evaluate the model with `CocoMetric`. |
|
|
|
## Prepare a config |
|
|
|
The second step is to prepare a config thus the dataset could be successfully loaded. Assume that we want to use Mask R-CNN with FPN, the config to train the detector on balloon dataset is as below. Assume the config is under directory `configs/balloon/` and named as `mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py`, the config is as below. |
|
|
|
```python |
|
# The new config inherits a base config to highlight the necessary modification |
|
_base_ = '../mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py' |
|
|
|
# We also need to change the num_classes in head to match the dataset's annotation |
|
model = dict( |
|
roi_head=dict( |
|
bbox_head=dict(num_classes=1), mask_head=dict(num_classes=1))) |
|
|
|
# Modify dataset related settings |
|
data_root = 'data/balloon/' |
|
metainfo = { |
|
'classes': ('balloon', ), |
|
'palette': [ |
|
(220, 20, 60), |
|
] |
|
} |
|
train_dataloader = dict( |
|
batch_size=1, |
|
dataset=dict( |
|
data_root=data_root, |
|
metainfo=metainfo, |
|
ann_file='train/annotation_coco.json', |
|
data_prefix=dict(img='train/'))) |
|
val_dataloader = dict( |
|
dataset=dict( |
|
data_root=data_root, |
|
metainfo=metainfo, |
|
ann_file='val/annotation_coco.json', |
|
data_prefix=dict(img='val/'))) |
|
test_dataloader = val_dataloader |
|
|
|
# Modify metric related settings |
|
val_evaluator = dict(ann_file=data_root + 'val/annotation_coco.json') |
|
test_evaluator = val_evaluator |
|
|
|
# We can use the pre-trained Mask RCNN model to obtain higher performance |
|
load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth' |
|
|
|
``` |
|
|
|
## Train a new model |
|
|
|
To train a model with the new config, you can simply run |
|
|
|
```shell |
|
python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py |
|
``` |
|
|
|
For more detailed usages, please refer to the [training guide](https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-predefined-models-on-standard-datasets). |
|
|
|
## Test and inference |
|
|
|
To test the trained model, you can simply run |
|
|
|
```shell |
|
python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth |
|
``` |
|
|
|
For more detailed usages, please refer to the [testing guide](https://mmdetection.readthedocs.io/en/latest/user_guides/test.html). |
|
|