Train predefined models on standard datasets¶

MMDetection also provides out-of-the-box tools for training detection models. This section will show how to train predefined models (under configs) on standard datasets i.e. COCO.

Prepare datasets¶

Preparing datasets is also necessary for training. See section Prepare datasets above for details.

Note: Currently, the config files under configs/cityscapes use COCO pre-trained weights to initialize. If your network connection is slow or unavailable, it’s advisable to download existing models before beginning training to avoid errors.

Learning rate auto scaling¶

Important: The default learning rate in config files is for 8 GPUs and 2 sample per GPU (batch size = 8 * 2 = 16). And it had been set to auto_scale_lr.base_batch_size in config/_base_/schedules/schedule_1x.py. The learning rate will be automatically scaled based on the value at a batch size of 16. Meanwhile, to avoid affecting other codebases that use mmdet, the default setting for the auto_scale_lr.enable flag is False.

If you want to enable this feature, you need to add argument --auto-scale-lr. And you need to check the config name which you want to use before you process the command, because the config name indicates the default batch size. By default, it is 8 x 2 = 16 batch size, like faster_rcnn_r50_caffe_fpn_90k_coco.py or pisa_faster_rcnn_x101_32x4d_fpn_1x_coco.py. In other cases, you will see the config file name have _NxM_ in dictating, like cornernet_hourglass104_mstest_32x3_210e_coco.py which batch size is 32 x 3 = 96, or scnet_x101_64x4d_fpn_8x1_20e_coco.py which batch size is 8 x 1 = 8.

Please remember to check the bottom of the specific config file you want to use, it will have auto_scale_lr.base_batch_size if the batch size is not 16. If you can’t find those values, check the config file which in _base_=[xxx] and you will find it. Please do not modify its values if you want to automatically scale the LR.

The basic usage of learning rate auto scaling is as follows.

python tools/train.py \
    ${CONFIG_FILE} \
    --auto-scale-lr \
    [optional arguments]

If you enabled this feature, the learning rate will be automatically scaled according to the number of GPUs on the machine and the batch size of training. See linear scaling rule for details. For example, If there are 4 GPUs and 2 pictures on each GPU, lr = 0.01, then if there are 16 GPUs and 4 pictures on each GPU, it will automatically scale to lr = 0.08.

If you don’t want to use it, you need to calculate the learning rate according to the linear scaling rule manually then change optimizer.lr in specific config file.

Training on a single GPU¶

We provide tools/train.py to launch training jobs on a single GPU. The basic usage is as follows.

python tools/train.py \
    ${CONFIG_FILE} \
    [optional arguments]

During training, log files and checkpoints will be saved to the working directory, which is specified by work_dir in the config file or via CLI argument --work-dir.

By default, the model is evaluated on the validation set every epoch, the evaluation interval can be specified in the config file as shown below.

# evaluate the model every 12 epochs.
train_cfg = dict(val_interval=12)

This tool accepts several optional arguments, including:

--work-dir ${WORK_DIR}: Override the working directory.
--resume: resume from the latest checkpoint in the work_dir automatically.
--resume ${CHECKPOINT_FILE}: resume from the specific checkpoint.
--cfg-options 'Key=value': Overrides other settings in the used config.

Note:

There is a difference between resume and load-from:

resume loads both the weights of the model and the state of the optimizer, and it inherits the iteration number from the specified checkpoint, so training does not start again from scratch. load-from, on the other hand, only loads the weights of the model, and its training starts from scratch. It is often used for fine-tuning a model. load-from needs to be written in the config file, while resume is passed as a command line argument.

Training on CPU¶

The process of training on the CPU is consistent with single GPU training. We just need to disable GPUs before the training process.

export CUDA_VISIBLE_DEVICES=-1

And then run the script above.

Note:

We do not recommend users to use the CPU for training because it is too slow. We support this feature to allow users to debug on machines without GPU for convenience.

Training on multiple GPUs¶

We provide tools/dist_train.sh to launch training on multiple GPUs. The basic usage is as follows.

bash ./tools/dist_train.sh \
    ${CONFIG_FILE} \
    ${GPU_NUM} \
    [optional arguments]

Optional arguments remain the same as stated above.

Launch multiple jobs simultaneously¶

If you would like to launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.

If you use dist_train.sh to launch training jobs, you can set the port in the commands.

CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4

Train with multiple machines¶

If you launch with multiple machines simply connected with ethernet, you can simply run the following commands:

On the first machine:

NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS

On the second machine:

NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS

Usually, it is slow if you do not have high-speed networking like InfiniBand.

Manage jobs with Slurm¶

Slurm is a good job scheduling system for computing clusters. On a cluster managed by Slurm, you can use slurm_train.sh to spawn training jobs. It supports both single-node and multi-node training.

The basic usage is as follows.

[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}

Below is an example of using 16 GPUs to train Mask R-CNN on a Slurm partition named dev, and set the work-dir to some shared file systems.

GPUS=16 ./tools/slurm_train.sh dev mask_r50_1x configs/mask-rcnn_r50_fpn_1x_coco.py /nfs/xxxx/mask_rcnn_r50_fpn_1x

You can check the source code to review full arguments and environment variables.

When using Slurm, the port option needs to be set in one of the following ways:

Set the port through --options. This is more recommended since it does not change the original configs.

CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR} --cfg-options 'dist_params.port=29500'
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR} --cfg-options 'dist_params.port=29501'

Modify the config files to set different communication ports.

In config1.py, set

dist_params = dict(backend='nccl', port=29500)

In config2.py, set

dist_params = dict(backend='nccl', port=29501)

Then you can launch two jobs with config1.py and config2.py.

CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py ${WORK_DIR}
CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py ${WORK_DIR}

Train with customized datasets¶

In this part, you will know how to train predefined models with customized datasets and then test it. We use the balloon dataset as an example to describe the whole process.

The basic steps are as below:

Prepare the customized dataset
Prepare a config
Train, test, and infer models on the customized dataset.

Prepare the customized dataset¶

There are three ways to support a new dataset in MMDetection:

Reorganize the dataset into COCO format.
Reorganize the dataset into a middle format.
Implement a new dataset.

Usually, we recommend using the first two methods which are usually easier than the third.

In this note, we give an example of converting the data into COCO format.

Note: Datasets and metrics have been decoupled except CityScapes since MMDetection 3.0. Therefore, users can use any kind of evaluation metrics for any format of datasets during validation. For example: evaluate on COCO dataset with VOC metric, or evaluate on OpenImages dataset with both VOC and COCO metrics.

COCO annotation format¶

The necessary keys of COCO format for instance segmentation are as below, for the complete details, please refer here.

{
    "images": [image],
    "annotations": [annotation],
    "categories": [category]
}

image = {
    "id": int,
    "width": int,
    "height": int,
    "file_name": str,
}

annotation = {
    "id": int,
    "image_id": int,
    "category_id": int,
    "segmentation": RLE or [polygon],
    "area": float,
    "bbox": [x,y,width,height], # (x, y) are the coordinates of the upper left corner of the bbox
    "iscrowd": 0 or 1,
}

categories = [{
    "id": int,
    "name": str,
    "supercategory": str,
}]

Assume we use the balloon dataset. After downloading the data, we need to implement a function to convert the annotation format into the COCO format. Then we can use implemented CocoDataset to load the data and perform training and evaluation.

If you take a look at the dataset, you will find the dataset format is as below:

{'base64_img_data': '',
 'file_attributes': {},
 'filename': '34020010494_e5cb88e1c4_k.jpg',
 'fileref': '',
 'regions': {'0': {'region_attributes': {},
   'shape_attributes': {'all_points_x': [1020,
     1000,
     994,
     1003,
     1023,
     1050,
     1089,
     1134,
     1190,
     1265,
     1321,
     1361,
     1403,
     1428,
     1442,
     1445,
     1441,
     1427,
     1400,
     1361,
     1316,
     1269,
     1228,
     1198,
     1207,
     1210,
     1190,
     1177,
     1172,
     1174,
     1170,
     1153,
     1127,
     1104,
     1061,
     1032,
     1020],
    'all_points_y': [963,
     899,
     841,
     787,
     738,
     700,
     663,
     638,
     621,
     619,
     643,
     672,
     720,
     765,
     800,
     860,
     896,
     942,
     990,
     1035,
     1079,
     1112,
     1129,
     1134,
     1144,
     1153,
     1166,
     1166,
     1150,
     1136,
     1129,
     1122,
     1112,
     1084,
     1037,
     989,
     963],
    'name': 'polygon'}}},
 'size': 1115004}

The annotation is a JSON file where each key indicates an image’s all annotations. The code to convert the balloon dataset into coco format is as below.

import os.path as osp

import mmcv

from mmengine.fileio import dump, load
from mmengine.utils import track_iter_progress


def convert_balloon_to_coco(ann_file, out_file, image_prefix):
    data_infos = load(ann_file)

    annotations = []
    images = []
    obj_count = 0
    for idx, v in enumerate(track_iter_progress(data_infos.values())):
        filename = v['filename']
        img_path = osp.join(image_prefix, filename)
        height, width = mmcv.imread(img_path).shape[:2]

        images.append(
            dict(id=idx, file_name=filename, height=height, width=width))

        for _, obj in v['regions'].items():
            assert not obj['region_attributes']
            obj = obj['shape_attributes']
            px = obj['all_points_x']
            py = obj['all_points_y']
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]

            x_min, y_min, x_max, y_max = (min(px), min(py), max(px), max(py))

            data_anno = dict(
                image_id=idx,
                id=obj_count,
                category_id=0,
                bbox=[x_min, y_min, x_max - x_min, y_max - y_min],
                area=(x_max - x_min) * (y_max - y_min),
                segmentation=[poly],
                iscrowd=0)
            annotations.append(data_anno)
            obj_count += 1

    coco_format_json = dict(
        images=images,
        annotations=annotations,
        categories=[{
            'id': 0,
            'name': 'balloon'
        }])
    dump(coco_format_json, out_file)


if __name__ == '__main__':
    convert_balloon_to_coco(ann_file='data/balloon/train/via_region_data.json',
                            out_file='data/balloon/train/annotation_coco.json',
                            image_prefix='data/balloon/train')
    convert_balloon_to_coco(ann_file='data/balloon/val/via_region_data.json',
                            out_file='data/balloon/val/annotation_coco.json',
                            image_prefix='data/balloon/val')

Using the function above, users can successfully convert the annotation file into json format, then we can use CocoDataset to train and evaluate the model with CocoMetric.

Prepare a config¶

The second step is to prepare a config thus the dataset could be successfully loaded. Assume that we want to use Mask R-CNN with FPN, the config to train the detector on balloon dataset is as below. Assume the config is under directory configs/balloon/ and named as mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py, the config is as below. Please refer Learn about Configs - MMDetection 3.0.0 documentation to get detailed information about config files.

# The new config inherits a base config to highlight the necessary modification
_base_ = '../mask_rcnn/mask-rcnn_r50-caffe_fpn_ms-poly-1x_coco.py'

# We also need to change the num_classes in head to match the dataset's annotation
model = dict(
    roi_head=dict(
        bbox_head=dict(num_classes=1), mask_head=dict(num_classes=1)))

# Modify dataset related settings
data_root = 'data/balloon/'
metainfo = {
    'classes': ('balloon', ),
    'palette': [
        (220, 20, 60),
    ]
}
train_dataloader = dict(
    batch_size=1,
    dataset=dict(
        data_root=data_root,
        metainfo=metainfo,
        ann_file='train/annotation_coco.json',
        data_prefix=dict(img='train/')))
val_dataloader = dict(
    dataset=dict(
        data_root=data_root,
        metainfo=metainfo,
        ann_file='val/annotation_coco.json',
        data_prefix=dict(img='val/')))
test_dataloader = val_dataloader

# Modify metric related settings
val_evaluator = dict(ann_file=data_root + 'val/annotation_coco.json')
test_evaluator = val_evaluator

# We can use the pre-trained Mask RCNN model to obtain higher performance
load_from = 'https://download.openmmlab.com/mmdetection/v2.0/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco/mask_rcnn_r50_caffe_fpn_mstrain-poly_3x_coco_bbox_mAP-0.408__segm_mAP-0.37_20200504_163245-42aa3d00.pth'

Train a new model¶

To train a model with the new config, you can simply run

python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py

For more detailed usages, please refer to the training guide.

Test and inference¶

To test the trained model, you can simply run

python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth

For more detailed usages, please refer to the testing guide.