Benchmark and Model Zoo¶

Mirror sites¶

We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun. You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab with https://open-mmlab.oss-cn-beijing.aliyuncs.com in model urls.

Common settings¶

All models were trained on coco_2017_train, and tested on the coco_2017_val.
We use distributed training.
All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo, caffe-style pretrained backbones are converted from the newly released model from detectron2.
For fair comparison with other codebases, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 8 GPUs. Note that this value is usually less than what nvidia-smi shows.
We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script benchmark.py which computes the average time on 2000 images.

Baselines¶

RPN¶

Please refer to RPN for details.

Faster R-CNN¶

Please refer to Faster R-CNN for details.

Mask R-CNN¶

Please refer to Mask R-CNN for details.

Fast R-CNN (with pre-computed proposals)¶

Please refer to Fast R-CNN for details.

RetinaNet¶

Please refer to RetinaNet for details.

Cascade R-CNN and Cascade Mask R-CNN¶

Please refer to Cascade R-CNN for details.

Hybrid Task Cascade (HTC)¶

Please refer to HTC for details.

SSD¶

Please refer to SSD for details.

Group Normalization (GN)¶

Please refer to Group Normalization for details.

Weight Standardization¶

Please refer to Weight Standardization for details.

Deformable Convolution v2¶

Please refer to Deformable Convolutional Networks for details.

CARAFE: Content-Aware ReAssembly of FEatures¶

Please refer to CARAFE for details.

Instaboost¶

Please refer to Instaboost for details.

Libra R-CNN¶

Please refer to Libra R-CNN for details.

Guided Anchoring¶

Please refer to Guided Anchoring for details.

FCOS¶

Please refer to FCOS for details.

FoveaBox¶

Please refer to FoveaBox for details.

RepPoints¶

Please refer to RepPoints for details.

FreeAnchor¶

Please refer to FreeAnchor for details.

Grid R-CNN (plus)¶

Please refer to Grid R-CNN for details.

GHM¶

Please refer to GHM for details.

GCNet¶

Please refer to GCNet for details.

HRNet¶

Please refer to HRNet for details.

Mask Scoring R-CNN¶

Please refer to Mask Scoring R-CNN for details.

Train from Scratch¶

Please refer to Rethinking ImageNet Pre-training for details.

NAS-FPN¶

Please refer to NAS-FPN for details.

ATSS¶

Please refer to ATSS for details.

FSAF¶

Please refer to FSAF for details.

RegNetX¶

Please refer to RegNet for details.

Res2Net¶

Please refer to Res2Net for details.

GRoIE¶

Please refer to GRoIE for details.

Other datasets¶

We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.

Pre-trained Models¶

We also train Faster R-CNN and Mask R-CNN using ResNet-50 and RegNetX-3.2G with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience.

Speed benchmark¶

We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from detectron2). For mmdetection, we benchmark with mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py, which should have the same setting with mask_rcnn_R_50_FPN_noaug_1x.yaml of detectron2. We also provide the checkpoint and training log for reference. The throughput is computed as the average throughput in iterations 100-500 to skip GPU warmup time.

Implementation	Throughput (img/s)
Detectron2	62
MMDetection	61
maskrcnn-benchmark	53
tensorpack	50
simpledet	39
Detectron	19
matterport/Mask_RCNN	14

Comparison with Detectron2¶

We compare mmdetection with Detectron2 in terms of speed and performance. We use the commit id 185c27e(30/4/2020) of detectron. For fair comparison, we install and run both frameworks on the same machine.

Hardware¶

8 NVIDIA Tesla V100 (32G) GPUs
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Software environment¶

Python 3.7
PyTorch 1.4
CUDA 10.1
CUDNN 7.6.03
NCCL 2.4.08

Performance¶

Type	Lr schd	Detectron2	mmdetection	Download
Faster R-CNN	1x	37.9	38.0	model \| log
Mask R-CNN	1x	38.6 & 35.2	38.8 & 35.4	model \| log
Retinanet	1x	36.5	37.0	model \| log

Training Speed¶

The training speed is measure with s/iter. The lower, the better.

Type	Detectron2	mmdetection
Faster R-CNN	0.210	0.216
Mask R-CNN	0.261	0.265
Retinanet	0.200	0.205

Inference Speed¶

The inference speed is measured with fps (img/s) on a single GPU, the higher, the better. To be consistent with Detectron2, we report the pure inference speed (without the time of data loading). For Mask R-CNN, we exclude the time of RLE encoding in post-processing. We also include the officially reported speed in the parentheses, which is slightly higher than the results tested on our server due to differences of hardwares.

Type	Detectron2	mmdetection
Faster R-CNN	25.6 (26.3)	22.2
Mask R-CNN	22.5 (23.3)	19.6
Retinanet	17.8 (18.2)	20.6

Training memory¶

Type	Detectron2	mmdetection
Faster R-CNN	3.0	3.8
Mask R-CNN	3.4	3.9
Retinanet	3.9	3.4