Migrate Configuration File from MMDetection 2.x to 3.x
The configuration file of MMDetection 3.x has undergone significant changes in comparison to the 2.x version. This document explains how to migrate 2.x configuration files to 3.x.
In the previous tutorial Learn about Configs, we used Mask R-CNN as an example to introduce the configuration file structure of MMDetection 3.x. Here, we will follow the same structure to demonstrate how to migrate 2.x configuration files to 3.x.
Model Configuration
There have been no major changes to the model configuration in 3.x compared to 2.x. For the model's backbone, neck, head, as well as train_cfg and test_cfg, the parameters remain the same as in version 2.x.
On the other hand, we have added the DataPreprocessor
module in MMDetection 3.x. The configuration for the DataPreprocessor
module is located in model.data_preprocessor
. It is used to preprocess the input data, such as normalizing input images and padding images of different sizes into batches, and loading images from memory to VRAM. This configuration replaces the Normalize
and Pad
modules in train_pipeline
and test_pipeline
of the earlier version.
2.x Config |
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True)
pipeline=[
...,
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
...
]
|
2.x Config |
model = dict(
data_preprocessor=dict(
type='DetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_mask=True,
pad_size_divisor=32)
)
|
Dataset and Evaluator Configuration
The dataset and evaluator configurations have undergone major changes compared to version 2.x. We will introduce how to migrate from version 2.x to version 3.x from three aspects: Dataloader and Dataset, Data transform pipeline, and Evaluator configuration.
Dataloader and Dataset Configuration
In the new version, we set the data loading settings consistent with PyTorch's official DataLoader,
making it easier for users to understand and get started with.
We put the data loading settings for training, validation, and testing separately in train_dataloader
, val_dataloader
, and test_dataloader
.
Users can set different parameters for these dataloaders.
The input parameters are basically the same as those required by PyTorch DataLoader.
This way, we put the unconfigurable parameters in version 2.x, such as sampler
, batch_sampler
, and persistent_workers
, in the configuration file, so that users can set dataloader parameters more flexibly.
Users can set the dataset configuration through train_dataloader.dataset
, val_dataloader.dataset
, and test_dataloader.dataset
, which correspond to data.train
, data.val
, and data.test
in version 2.x.
2.x Config |
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_train2017.json',
img_prefix=data_root + 'train2017/',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'annotations/instances_val2017.json',
img_prefix=data_root + 'val2017/',
pipeline=test_pipeline))
|
3.x Config |
train_dataloader = dict(
batch_size=2,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
batch_sampler=dict(type='AspectRatioBatchSampler'),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='annotations/instances_train2017.json',
data_prefix=dict(img='train2017/'),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=train_pipeline))
val_dataloader = dict(
batch_size=1,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
ann_file='annotations/instances_val2017.json',
data_prefix=dict(img='val2017/'),
test_mode=True,
pipeline=test_pipeline))
test_dataloader = val_dataloader
|
Data Transform Pipeline Configuration
As mentioned earlier, we have separated the normalization and padding configurations for images from the train_pipeline
and test_pipeline
, and have placed them in model.data_preprocessor
instead. Hence, in the 3.x version of the pipeline, we no longer require the Normalize
and Pad
transforms.
At the same time, we have also refactored the transform responsible for packing the data format, and have merged the Collect
and DefaultFormatBundle
transforms into PackDetInputs
. This transform is responsible for packing the data from the data pipeline into the input format of the model. For more details on the input format conversion, please refer to the data flow documentation.
Below, we will use the train_pipeline
of Mask R-CNN as an example, to demonstrate how to migrate from the 2.x configuration to the 3.x configuration:
2.x Config |
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
|
3.x Config |
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', prob=0.5),
dict(type='PackDetInputs')
]
|
For the test_pipeline
, apart from removing the Normalize
and Pad
transforms, we have also separated the data augmentation for testing (TTA) from the normal testing process, and have removed MultiScaleFlipAug
. For more information on how to use the new TTA version, please refer to the TTA documentation.
Below, we will again use the test_pipeline
of Mask R-CNN as an example, to demonstrate how to migrate from the 2.x configuration to the 3.x configuration:
2.x Config |
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
|
3.x Config |
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(1333, 800), keep_ratio=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
|
In addition, we have also refactored some data augmentation transforms. The following table lists the mapping between the transforms used in the 2.x version and the 3.x version:
Name |
2.x Config |
3.x Config |
Resize |
dict(type='Resize',
img_scale=(1333, 800),
keep_ratio=True)
|
dict(type='Resize',
scale=(1333, 800),
keep_ratio=True)
|
RandomResize |
dict(
type='Resize',
img_scale=[
(1333, 640), (1333, 800)],
multiscale_mode='range',
keep_ratio=True)
|
dict(
type='RandomResize',
scale=[
(1333, 640), (1333, 800)],
keep_ratio=True)
|
RandomChoiceResize |
dict(
type='Resize',
img_scale=[
(1333, 640), (1333, 672),
(1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode='value',
keep_ratio=True)
|
dict(
type='RandomChoiceResize',
scales=[
(1333, 640), (1333, 672),
(1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
keep_ratio=True)
|
RandomFlip |
dict(type='RandomFlip', flip_ratio=0.5)
|
dict(type='RandomFlip', prob=0.5)
|
评测器配置
In version 3.x, model accuracy evaluation is no longer tied to the dataset, but is instead accomplished through the use of an Evaluator.
The Evaluator configuration is divided into two parts: val_evaluator
and test_evaluator
. The val_evaluator
is used for validation dataset evaluation, while the test_evaluator
is used for testing dataset evaluation.
This corresponds to the evaluation
field in version 2.x.
The following table shows the corresponding relationship between Evaluators in version 2.x and 3.x.
Metric Name |
2.x Config |
3.x Config |
COCO |
data = dict(
val=dict(
type='CocoDataset',
ann_file=data_root + 'annotations/instances_val2017.json'))
evaluation = dict(metric=['bbox', 'segm'])
|
val_evaluator = dict(
type='CocoMetric',
ann_file=data_root + 'annotations/instances_val2017.json',
metric=['bbox', 'segm'],
format_only=False)
|
Pascal VOC |
data = dict(
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt'))
evaluation = dict(metric='mAP')
|
val_evaluator = dict(
type='VOCMetric',
metric='mAP',
eval_mode='11points')
|
OpenImages |
data = dict(
val=dict(
type='OpenImagesDataset',
ann_file=data_root + 'annotations/validation-annotations-bbox.csv',
img_prefix=data_root + 'OpenImages/validation/',
label_file=data_root + 'annotations/class-descriptions-boxable.csv',
hierarchy_file=data_root +
'annotations/bbox_labels_600_hierarchy.json',
meta_file=data_root + 'annotations/validation-image-metas.pkl',
image_level_ann_file=data_root +
'annotations/validation-annotations-human-imagelabels-boxable.csv'))
evaluation = dict(interval=1, metric='mAP')
|
val_evaluator = dict(
type='OpenImagesMetric',
iou_thrs=0.5,
ioa_thrs=0.5,
use_group_of=True,
get_supercategory=True)
|
CityScapes |
data = dict(
val=dict(
type='CityScapesDataset',
ann_file=data_root +
'annotations/instancesonly_filtered_gtFine_val.json',
img_prefix=data_root + 'leftImg8bit/val/',
pipeline=test_pipeline))
evaluation = dict(metric=['bbox', 'segm'])
|
val_evaluator = [
dict(
type='CocoMetric',
ann_file=data_root +
'annotations/instancesonly_filtered_gtFine_val.json',
metric=['bbox', 'segm']),
dict(
type='CityScapesMetric',
ann_file=data_root +
'annotations/instancesonly_filtered_gtFine_val.json',
seg_prefix=data_root + '/gtFine/val',
outfile_prefix='./work_dirs/cityscapes_metric/instance')
]
|
Configuration for Training and Testing
2.x Config |
runner = dict(
type='EpochBasedRunner',
max_epochs=12)
evaluation = dict(interval=2)
|
3.x Config |
train_cfg = dict(
type='EpochBasedTrainLoop',
max_epochs=12,
val_interval=2)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
|
Optimization Configuration
The configuration for optimizer and gradient clipping is moved to the optim_wrapper
field.
The following table shows the correspondences for optimizer configuration between 2.x version and 3.x version:
2.x Config |
optimizer = dict(
type='SGD',
lr=0.02,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
|
3.x Config |
optim_wrapper = dict(
type='OptimWrapper',
optimizer=dict(
type='SGD',
lr=0.02,
momentum=0.9,
weight_decay=0.0001),
clip_grad=None,
)
|
The configuration for learning rate is also moved from the lr_config
field to the param_scheduler
field. The param_scheduler
configuration is more similar to PyTorch's learning rate scheduler and more flexible. The following table shows the correspondences for learning rate configuration between 2.x version and 3.x version:
2.x Config |
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11],
gamma=0.1)
|
3.x Config |
param_scheduler = [
dict(
type='LinearLR',
start_factor=0.001,
by_epoch=False,
begin=0,
end=500),
dict(
type='MultiStepLR',
by_epoch=True,
begin=0,
end=12,
milestones=[8, 11],
gamma=0.1)
]
|
For information on how to migrate other learning rate adjustment policies, please refer to the learning rate migration document of MMEngine.
Migration of Other Configurations
Configuration for Saving Checkpoints
Function |
2.x Config |
3.x Config |
Set Save Interval |
checkpoint_config = dict(
interval=1)
|
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
interval=1))
|
Save Best Model |
evaluation = dict(
save_best='auto')
|
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
save_best='auto'))
|
Keep Latest Model |
checkpoint_config = dict(
max_keep_ckpts=3)
|
default_hooks = dict(
checkpoint=dict(
type='CheckpointHook',
max_keep_ckpts=3))
|
Logging Configuration
In MMDetection 3.x, the logging and visualization of the log are carried out respectively by the logger and visualizer in MMEngine. The following table shows the comparison between the configuration of printing logs and visualizing logs in MMDetection 2.x and 3.x.
Function |
2.x Config |
3.x Config |
Set Log Printing Interval |
log_config = dict(interval=50)
|
default_hooks = dict(
logger=dict(type='LoggerHook', interval=50))
log_processor = dict(
type='LogProcessor', window_size=50)
|
Use TensorBoard or WandB to visualize logs |
log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook'),
dict(type='MMDetWandbHook',
init_kwargs={
'project': 'mmdetection',
'group': 'maskrcnn-r50-fpn-1x-coco'
},
interval=50,
log_checkpoint=True,
log_checkpoint_metadata=True,
num_eval_images=100)
])
|
vis_backends = [
dict(type='LocalVisBackend'),
dict(type='TensorboardVisBackend'),
dict(type='WandbVisBackend',
init_kwargs={
'project': 'mmdetection',
'group': 'maskrcnn-r50-fpn-1x-coco'
})
]
visualizer = dict(
type='DetLocalVisualizer',
vis_backends=vis_backends,
name='visualizer')
|
For visualization-related tutorials, please refer to Visualization Tutorial of MMDetection.
Runtime Configuration
The runtime configuration fields in version 3.x have been adjusted, and the specific correspondence is as follows:
2.x Config |
3.x Config |
cudnn_benchmark = False
opencv_num_threads = 0
mp_start_method = 'fork'
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
|
env_cfg = dict(
cudnn_benchmark=False,
mp_cfg=dict(mp_start_method='fork',
opencv_num_threads=0),
dist_cfg=dict(backend='nccl'))
log_level = 'INFO'
load_from = None
resume = False
|