Benchmark and Model Zoo

Mirror sites

We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun. You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab with https://open-mmlab.oss-cn-beijing.aliyuncs.com in model urls.

Common settings

All models were trained on coco_2017_train, and tested on the coco_2017_val.
We use distributed training.
All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo, caffe-style pretrained backbones are converted from the newly released model from detectron2.
For fair comparison with other codebases, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 8 GPUs. Note that this value is usually less than what nvidia-smi shows.
We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script benchmark.py which computes the average time on 2000 images.

Baselines

RPN

Please refer to RPN for details.

Faster R-CNN

Please refer to Faster R-CNN for details.

Mask R-CNN

Please refer to Mask R-CNN for details.

Fast R-CNN (with pre-computed proposals)

Please refer to Fast R-CNN for details.

RetinaNet

Please refer to RetinaNet for details.

Cascade R-CNN and Cascade Mask R-CNN

Please refer to Cascade R-CNN for details.

Hybrid Task Cascade (HTC)

Please refer to HTC for details.

SSD

Please refer to SSD for details.

Group Normalization (GN)

Please refer to Group Normalization for details.

Weight Standardization

Please refer to Weight Standardization for details.

Deformable Convolution v2

Please refer to Deformable Convolutional Networks for details.

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to CARAFE for details.

Instaboost

Please refer to Instaboost for details.

Libra R-CNN

Please refer to Libra R-CNN for details.

Guided Anchoring

Please refer to Guided Anchoring for details.

FCOS

Please refer to FCOS for details.

FoveaBox

Please refer to FoveaBox for details.

RepPoints

Please refer to RepPoints for details.

FreeAnchor

Please refer to FreeAnchor for details.

Grid R-CNN (plus)

Please refer to Grid R-CNN for details.

GHM

Please refer to GHM for details.

GCNet

Please refer to GCNet for details.

HRNet

Please refer to HRNet for details.

Mask Scoring R-CNN

Please refer to Mask Scoring R-CNN for details.

Train from Scratch

Please refer to Rethinking ImageNet Pre-training for details.

NAS-FPN

Please refer to NAS-FPN for details.

ATSS

Please refer to ATSS for details.

FSAF

Please refer to FSAF for details.

RegNetX

Please refer to RegNet for details.

Res2Net

Please refer to Res2Net for details.

GRoIE

Please refer to GRoIE for details.

Dynamic R-CNN

Please refer to Dynamic R-CNN for details.

PointRend

Please refer to PointRend for details.

DetectoRS

Please refer to DetectoRS for details.

Generalized Focal Loss

Please refer to Generalized Focal Loss for details.

CornerNet

Please refer to CornerNet for details.

YOLOv3

Please refer to YOLOv3 for details.

PAA

Please refer to PAA for details.

SABL

Please refer to SABL for details.

CentripetalNet

Please refer to CentripetalNet for details.

ResNeSt

Please refer to ResNeSt for details.

DETR

Please refer to DETR for details.

Deformable DETR

Please refer to Deformable DETR for details.

AutoAssign

Please refer to AutoAssign for details.

YOLOF

Please refer to YOLOF for details.

Seesaw Loss

Please refer to Seesaw Loss for details.

CenterNet

Please refer to CenterNet for details.

Other datasets

We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.

Pre-trained Models

We also train Faster R-CNN and Mask R-CNN using ResNet-50 and RegNetX-3.2G with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience.

Speed benchmark

Training Speed benchmark

We provide analyze_logs.py to get average time of iteration in training. You can find examples in Log Analysis.

We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from detectron2). For mmdetection, we benchmark with mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py, which should have the same setting with mask_rcnn_R_50_FPN_noaug_1x.yaml of detectron2. We also provide the checkpoint and training log for reference. The throughput is computed as the average throughput in iterations 100-500 to skip GPU warmup time.

Implementation	Throughput (img/s)
Detectron2	62
MMDetection	61
maskrcnn-benchmark	53
tensorpack	50
simpledet	39
Detectron	19
matterport/Mask_RCNN	14

Inference Speed Benchmark

We provide benchmark.py to benchmark the inference latency. The script benchmarkes the model with 2000 images and calculates the average time ignoring first 5 times. You can change the output log interval (defaults: 50) by setting LOG-INTERVAL.

python toools/benchmark.py ${CONFIG} ${CHECKPOINT} [--log-interval $[LOG-INTERVAL]] [--fuse-conv-bn]

The latency of all models in our model zoo is benchmarked without setting fuse-conv-bn, you can get a lower latency by setting it.

Comparison with Detectron2

We compare mmdetection with Detectron2 in terms of speed and performance. We use the commit id 185c27e(30/4/2020) of detectron. For fair comparison, we install and run both frameworks on the same machine.

Hardware

8 NVIDIA Tesla V100 (32G) GPUs
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Software environment

Python 3.7
PyTorch 1.4
CUDA 10.1
CUDNN 7.6.03
NCCL 2.4.08

Performance

Type	Lr schd	Detectron2	mmdetection	Download
Faster R-CNN	1x	37.9	38.0	model \| log
Mask R-CNN	1x	38.6 & 35.2	38.8 & 35.4	model \| log
Retinanet	1x	36.5	37.0	model \| log

Training Speed

The training speed is measure with s/iter. The lower, the better.

Type	Detectron2	mmdetection
Faster R-CNN	0.210	0.216
Mask R-CNN	0.261	0.265
Retinanet	0.200	0.205

Inference Speed

The inference speed is measured with fps (img/s) on a single GPU, the higher, the better. To be consistent with Detectron2, we report the pure inference speed (without the time of data loading). For Mask R-CNN, we exclude the time of RLE encoding in post-processing. We also include the officially reported speed in the parentheses, which is slightly higher than the results tested on our server due to differences of hardwares.

Type	Detectron2	mmdetection
Faster R-CNN	25.6 (26.3)	22.2
Mask R-CNN	22.5 (23.3)	19.6
Retinanet	17.8 (18.2)	20.6

Training memory

Type	Detectron2	mmdetection
Faster R-CNN	3.0	3.8
Mask R-CNN	3.4	3.9
Retinanet	3.9	3.4