Benchmark and Model Zoo

Environment

Hardware

  • 8 NVIDIA Tesla V100 GPUs
  • Intel Xeon 4114 CPU @ 2.20GHz

Software environment

  • Python 3.6 / 3.7
  • PyTorch 1.1
  • CUDA 9.0.176
  • CUDNN 7.0.4
  • NCCL 2.1.15

Mirror sites

We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun. You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab with https://open-mmlab.oss-cn-beijing.aliyuncs.com in model urls.

Common settings

  • All FPN baselines and RPN-C4 baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU). Other C4 baselines were trained using 8 GPU with a batch size of 8 (1 image per GPU).
  • All models were trained on coco_2017_train, and tested on the coco_2017_val.
  • We use distributed training and BN layer stats are fixed.
  • We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
  • All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
  • For fair comparison with other codebases, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 8 GPUs. Note that this value is usually less than what nvidia-smi shows.
  • We report the inference time as the overall time including data loading, network forwarding and post processing.

Baselines

More models with different backbones will be added to the model zoo.

RPN

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) AR1000 Download
R-50-C4 caffe 1x - - 20.5 51.1 model
R-50-C4 caffe 2x 2.2 0.17 20.3 52.2 model
R-50-C4 pytorch 1x - - 20.1 50.2 model
R-50-C4 pytorch 2x - - 20.0 51.1 model
R-50-FPN caffe 1x 3.3 0.253 16.9 58.2 -
R-50-FPN pytorch 1x 3.5 0.276 17.7 57.1 model
R-50-FPN pytorch 2x - - - 57.6 model
R-101-FPN caffe 1x 5.2 0.379 13.9 59.4 -
R-101-FPN pytorch 1x 5.4 0.396 14.4 58.6 model
R-101-FPN pytorch 2x - - - 59.1 model
X-101-32x4d-FPN pytorch 1x 6.6 0.589 11.8 59.4 model
X-101-32x4d-FPN pytorch 2x - - - 59.9 model
X-101-64x4d-FPN pytorch 1x 9.5 0.955 8.3 59.8 model
X-101-64x4d-FPN pytorch 2x - - - 60.0 model

Faster R-CNN

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP Download
R-50-C4 caffe 1x - - 9.5 34.9 model
R-50-C4 caffe 2x 4.0 0.39 9.3 36.5 model
R-50-C4 pytorch 1x - - 9.3 33.9 model
R-50-C4 pytorch 2x - - 9.4 35.9 model
R-50-FPN caffe 1x 3.6 0.333 13.5 36.6 -
R-50-FPN pytorch 1x 3.8 0.353 13.6 36.4 model
R-50-FPN pytorch 2x - - - 37.7 model
R-101-FPN caffe 1x 5.5 0.465 11.5 38.8 -
R-101-FPN pytorch 1x 5.7 0.474 11.9 38.5 model
R-101-FPN pytorch 2x - - - 39.4 model
X-101-32x4d-FPN pytorch 1x 6.9 0.672 10.3 40.1 model
X-101-32x4d-FPN pytorch 2x - - - 40.4 model
X-101-64x4d-FPN pytorch 1x 9.8 1.040 7.3 41.3 model
X-101-64x4d-FPN pytorch 2x - - - 40.7 model
HRNetV2p-W18 pytorch 1x - - - 36.1 model
HRNetV2p-W18 pytorch 2x - - - 38.3 model
HRNetV2p-W32 pytorch 1x - - - 39.5 model
HRNetV2p-W32 pytorch 2x - - - 40.6 model
HRNetV2p-W48 pytorch 1x - - - 40.9 model
HRNetV2p-W48 pytorch 2x - - - 41.5 model

Mask R-CNN

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP mask AP Download
R-50-C4 caffe 1x - - 8.1 35.9 31.5 model
R-50-C4 caffe 2x 4.2 0.43 8.1 37.9 32.9 model
R-50-C4 pytorch 1x - - 7.9 35.1 31.2 model
R-50-C4 pytorch 2x - - 8.0 37.2 32.5 model
R-50-FPN caffe 1x 3.8 0.430 10.2 37.4 34.3 -
R-50-FPN pytorch 1x 3.9 0.453 10.6 37.3 34.2 model
R-50-FPN pytorch 2x - - - 38.5 35.1 model
R-101-FPN caffe 1x 5.7 0.534 9.4 39.9 36.1 -
R-101-FPN pytorch 1x 5.8 0.571 9.5 39.4 35.9 model
R-101-FPN pytorch 2x - - - 40.3 36.5 model
X-101-32x4d-FPN pytorch 1x 7.1 0.759 8.3 41.1 37.1 model
X-101-32x4d-FPN pytorch 2x - - - 41.4 37.1 model
X-101-64x4d-FPN pytorch 1x 10.0 1.102 6.5 42.1 38.0 model
X-101-64x4d-FPN pytorch 2x - - - 42.0 37.7 model
HRNetV2p-W18 pytorch 1x - - - 37.3 34.2 model
HRNetV2p-W18 pytorch 2x - - - 39.2 35.7 model
HRNetV2p-W32 pytorch 1x - - - 40.7 36.8 model
HRNetV2p-W32 pytorch 2x - - - 41.7 37.5 model
HRNetV2p-W48 pytorch 1x - - - 42.4 38.1 model
HRNetV2p-W48 pytorch 2x - - - 42.9 38.3 model

Fast R-CNN (with pre-computed proposals)

Backbone Style Type Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP mask AP Download
R-50-C4 caffe Faster 1x - - 6.7 35.0 - model
R-50-C4 caffe Faster 2x 3.8 0.34 6.6 36.4 - model
R-50-C4 pytorch Faster 1x - - 6.3 34.2 - model
R-50-C4 pytorch Faster 2x - - 6.1 35.8 - model
R-50-FPN caffe Faster 1x 3.3 0.242 18.4 36.6 - -
R-50-FPN pytorch Faster 1x 3.5 0.250 16.5 35.8 - model
R-50-C4 caffe Mask 1x - - 8.1 35.9 31.5 model
R-50-C4 caffe Mask 2x 4.2 0.43 8.1 37.9 32.9 model
R-50-C4 pytorch Mask 1x - - 7.9 35.1 31.2 model
R-50-C4 pytorch Mask 2x - - 8.0 37.2 32.5 model
R-50-FPN pytorch Faster 2x - - - 37.1 - model
R-101-FPN caffe Faster 1x 5.2 0.355 14.4 38.6 - -
R-101-FPN pytorch Faster 1x 5.4 0.388 13.2 38.1 - model
R-101-FPN pytorch Faster 2x - - - 38.8 - model
R-50-FPN caffe Mask 1x 3.4 0.328 12.8 37.3 34.5 -
R-50-FPN pytorch Mask 1x 3.5 0.346 12.7 36.8 34.1 model
R-50-FPN pytorch Mask 2x - - - 37.9 34.8 model
R-101-FPN caffe Mask 1x 5.2 0.429 11.2 39.4 36.1 -
R-101-FPN pytorch Mask 1x 5.4 0.462 10.9 38.9 35.8 model
R-101-FPN pytorch Mask 2x - - - 39.9 36.4 model

RetinaNet

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP Download
R-50-FPN caffe 1x 3.4 0.285 12.5 35.8 -
R-50-FPN pytorch 1x 3.6 0.308 12.1 35.6 model
R-50-FPN pytorch 2x - - - 36.4 model
R-101-FPN caffe 1x 5.3 0.410 10.4 37.8 -
R-101-FPN pytorch 1x 5.5 0.429 10.9 37.7 model
R-101-FPN pytorch 2x - - - 38.1 model
X-101-32x4d-FPN pytorch 1x 6.7 0.632 9.3 39.0 model
X-101-32x4d-FPN pytorch 2x - - - 39.3 model
X-101-64x4d-FPN pytorch 1x 9.6 0.993 7.0 40.0 model
X-101-64x4d-FPN pytorch 2x - - - 39.6 model

Cascade R-CNN

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP Download
R-50-C4 caffe 1x 8.7 0.92 5.0 38.7 model
R-50-FPN caffe 1x 3.9 0.464 10.9 40.5 -
R-50-FPN pytorch 1x 4.1 0.455 11.9 40.4 model
R-50-FPN pytorch 20e - - - 41.1 model
R-101-FPN caffe 1x 5.8 0.569 9.6 42.4 -
R-101-FPN pytorch 1x 6.0 0.584 10.3 42.0 model
R-101-FPN pytorch 20e - - - 42.5 model
X-101-32x4d-FPN pytorch 1x 7.2 0.770 8.9 43.6 model
X-101-32x4d-FPN pytorch 20e - - - 44.0 model
X-101-64x4d-FPN pytorch 1x 10.0 1.133 6.7 44.5 model
X-101-64x4d-FPN pytorch 20e - - - 44.7 model
HRNetV2p-W18 pytorch 20e - - - 41.2 model
HRNetV2p-W32 pytorch 20e - - - 43.7 model
HRNetV2p-W48 pytorch 20e - - - 44.6 model

Cascade Mask R-CNN

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP mask AP Download
R-50-C4 caffe 1x 9.1 0.99 4.5 39.3 32.8 model
R-50-FPN caffe 1x 5.1 0.692 7.6 40.9 35.5 -
R-50-FPN pytorch 1x 5.3 0.683 7.4 41.2 35.7 model
R-50-FPN pytorch 20e - - - 42.3 36.6 model
R-101-FPN caffe 1x 7.0 0.803 7.2 43.1 37.2 -
R-101-FPN pytorch 1x 7.2 0.807 6.8 42.6 37.0 model
R-101-FPN pytorch 20e - - - 43.3 37.6 model
X-101-32x4d-FPN pytorch 1x 8.4 0.976 6.6 44.4 38.2 model
X-101-32x4d-FPN pytorch 20e - - - 44.7 38.6 model
X-101-64x4d-FPN pytorch 1x 11.4 1.33 5.3 45.4 39.1 model
X-101-64x4d-FPN pytorch 20e - - - 45.7 39.4 model
HRNetV2p-W18 pytorch 20e - - - 41.9 36.4 model
HRNetV2p-W32 pytorch 20e - - - 44.5 38.5 model
HRNetV2p-W48 pytorch 20e - - - 46.0 39.5 model

Notes:

  • The 20e schedule in Cascade (Mask) R-CNN indicates decreasing the lr at 16 and 19 epochs, with a total of 20 epochs.

Hybrid Task Cascade (HTC)

Backbone Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP mask AP Download
R-50-FPN pytorch 1x 7.4 0.936 4.1 42.1 37.3 model
R-50-FPN pytorch 20e - - - 43.2 38.1 model
R-101-FPN pytorch 20e 9.3 1.051 4.0 44.9 39.4 model
X-101-32x4d-FPN pytorch 20e 5.8 0.769 3.8 46.1 40.3 model
X-101-64x4d-FPN pytorch 20e 7.5 1.120 3.5 46.9 40.8 model
HRNetV2p-W18 pytorch 20e - - - 43.1 37.9 model
HRNetV2p-W32 pytorch 20e - - - 45.3 39.6 model
HRNetV2p-W48 pytorch 20e - - - 46.8 40.7 model
HRNetV2p-W48 pytorch 28e - - - 47.0 41.0 model

Notes:

SSD

Backbone Size Style Lr schd Mem (GB) Train time (s/iter) Inf time (fps) box AP Download
VGG16 300 caffe 120e 3.5 0.256 25.9 / 34.6 25.7 model
VGG16 512 caffe 120e 7.6 0.412 20.7 / 25.4 29.3 model

Notes:

  • cudnn.benchmark is set as True for SSD training and testing.
  • Inference time is reported for batch size = 1 and batch size = 8.
  • The speed on COCO and VOC are different due to model parameters and nms.

Group Normalization (GN)

Please refer to Group Normalization for details.

Weight Standardization

Please refer to Weight Standardization for details.

Deformable Convolution v2

Please refer to Deformable Convolutional Networks for details.

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to CARAFE for details.

Instaboost

Please refer to Instaboost for details.

Libra R-CNN

Please refer to Libra R-CNN for details.

Guided Anchoring

Please refer to Guided Anchoring for details.

FCOS

Please refer to FCOS for details.

FoveaBox

Please refer to FoveaBox for details.

RepPoints

Please refer to RepPoints for details.

FreeAnchor

Please refer to FreeAnchor for details.

Grid R-CNN (plus)

Please refer to Grid R-CNN for details.

GHM

Please refer to GHM for details.

GCNet

Please refer to GCNet for details.

HRNet

Please refer to HRNet for details.

Mask Scoring R-CNN

Please refer to Mask Scoring R-CNN for details.

Train from Scratch

Please refer to Rethinking ImageNet Pre-training for details.

NAS-FPN

Please refer to NAS-FPN for details.

ATSS

Please refer to ATSS for details.

Other datasets

We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.

Comparison with Detectron and maskrcnn-benchmark

We compare mmdetection with Detectron and maskrcnn-benchmark. The backbone used is R-50-FPN.

In general, mmdetection has 3 advantages over Detectron.

  • Higher performance (especially in terms of mask AP)
  • Faster training speed
  • Memory efficient

Performance

Detectron and maskrcnn-benchmark use caffe-style ResNet as the backbone. We report results using both caffe-style (weights converted from here) and pytorch-style (weights from the official model zoo) ResNet backbone, indicated as pytorch-style results / caffe-style results.

We find that pytorch-style ResNet usually converges slower than caffe-style ResNet, thus leading to slightly lower results in 1x schedule, but the final results of 2x schedule is higher.

Type Lr schd Detectron maskrcnn-benchmark mmdetection
RPN 1x 57.2 - 57.1 / 58.2
2x - - 57.6 / -
Faster R-CNN 1x 36.7 36.8 36.4 / 36.6
2x 37.9 - 37.7 / -
Mask R-CNN 1x 37.7 & 33.9 37.8 & 34.2 37.3 & 34.2 / 37.4 & 34.3
2x 38.6 & 34.5 - 38.5 & 35.1 / -
Fast R-CNN 1x 36.4 - 35.8 / 36.6
2x 36.8 - 37.1 / -
Fast R-CNN (w/mask) 1x 37.3 & 33.7 - 36.8 & 34.1 / 37.3 & 34.5
2x 37.7 & 34.0 - 37.9 & 34.8 / -

Training Speed

The training speed is measure with s/iter. The lower, the better.

Type Detectron (P1001) maskrcnn-benchmark (V100) mmdetection (V1002)
RPN 0.416 - 0.253
Faster R-CNN 0.544 0.353 0.333
Mask R-CNN 0.889 0.454 0.430
Fast R-CNN 0.285 - 0.242
Fast R-CNN (w/mask) 0.377 - 0.328

*1. Facebook’s Big Basin servers (P100/V100) is slightly faster than the servers we use. mmdetection can also run slightly faster on FB’s servers.

*2. For fair comparison, we list the caffe-style results here.

Inference Speed

The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.

Type Detectron (P100) maskrcnn-benchmark (V100) mmdetection (V100)
RPN 12.5 - 16.9
Faster R-CNN 10.3 7.9 13.5
Mask R-CNN 8.5 7.7 10.2
Fast R-CNN 12.5 - 18.4
Fast R-CNN (w/mask) 9.9 - 12.8

Training memory

Type Detectron maskrcnn-benchmark mmdetection
RPN 6.4 - 3.3
Faster R-CNN 7.2 4.4 3.6
Mask R-CNN 8.6 5.2 3.8
Fast R-CNN 6.0 - 3.3
Fast R-CNN (w/mask) 7.9 - 3.4

There is no doubt that maskrcnn-benchmark and mmdetection is more memory efficient than Detectron, and the main advantage is PyTorch itself. We also perform some memory optimizations to push it forward.

Note that Caffe2 and PyTorch have different apis to obtain memory usage with different implementations. For all codebases, nvidia-smi shows a larger memory usage than the reported number in the above table.