Finetuning Models¶

Detectors pre-trained on the COCO dataset can serve as a good pre-trained model for other datasets, e.g., CityScapes and KITTI Dataset. This tutorial provides instructions for users to use the models provided in the Model Zoo for other datasets to obtain better performance.

There are two steps to finetune a model on a new dataset.

Add support for the new dataset following Customize Datasets.
Modify the configs as will be discussed in this tutorial.

Take the finetuning process on Cityscapes Dataset as an example, the users need to modify five parts in the config.

Inherit base configs¶

To release the burden and reduce bugs in writing the whole configs, MMDetection V3.0 support inheriting configs from multiple existing configs. To finetune a Mask RCNN model, the new config needs to inherit _base_/models/mask-rcnn_r50_fpn.py to build the basic structure of the model. To use the Cityscapes Dataset, the new config can also simply inherit _base_/datasets/cityscapes_instance.py. For runtime settings such as logger settings, the new config needs to inherit _base_/default_runtime.py. For training schedules, the new config can to inherit _base_/schedules/schedule_1x.py. These configs are in the configs directory and the users can also choose to write the whole contents rather than use inheritance.

_base_ = [
    '../_base_/models/mask-rcnn_r50_fpn.py',
    '../_base_/datasets/cityscapes_instance.py', '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_1x.py'
]

Modify head¶

Then the new config needs to modify the head according to the class numbers of the new datasets. By only changing num_classes in the roi_head, the weights of the pre-trained models are mostly reused except for the final prediction head.

model = dict(
    roi_head=dict(
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=8,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0., 0., 0., 0.],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=8,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))

Modify dataset¶

The users may also need to prepare the dataset and write the configs about dataset, refer to Customize Datasets for more detail. MMDetection V3.0 already supports VOC, WIDERFACE, COCO, LIVS, OpenImages, DeepFashion, Objects365, and Cityscapes Dataset.

Modify training schedule¶

The finetuning hyperparameters vary from the default schedule. It usually requires a smaller learning rate and fewer training epochs

# optimizer
# lr is set for a batch size of 8
optim_wrapper = dict(optimizer=dict(lr=0.01))

# learning rate
param_scheduler = [
    dict(
        type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500),
    dict(
        type='MultiStepLR',
        begin=0,
        end=8,
        by_epoch=True,
        milestones=[7],
        gamma=0.1)
]

# max_epochs
train_cfg = dict(max_epochs=8)

# log config
default_hooks = dict(logger=dict(interval=100)),

Use pre-trained model¶

To use the pre-trained model, the new config adds the link of pre-trained models in the load_from. The users might need to download the model weights before training to avoid the download time during training.

load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'  # noqa