Benchmark and Model Zoo¶
Mirror sites¶
We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun.
You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab
with https://open-mmlab.oss-cn-beijing.aliyuncs.com
in model urls.
Common settings¶
- All FPN baselines and RPN-C4 baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU). Other C4 baselines were trained using 8 GPU with a batch size of 8 (1 image per GPU).
- All models were trained on
coco_2017_train
, and tested on thecoco_2017_val
. - We use distributed training and BN layer stats are fixed.
- We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
- For fair comparison with other codebases, we report the GPU memory as the maximum value of
torch.cuda.max_memory_allocated()
for all 8 GPUs. Note that this value is usually less than whatnvidia-smi
shows. - We report the inference time as the overall time including data loading, network forwarding and post processing.
Baselines¶
Faster R-CNN¶
Please refer to Faster R-CNN for details.
Mask R-CNN¶
Please refer to Mask R-CNN for details.
Fast R-CNN (with pre-computed proposals)¶
Please refer to Fast R-CNN for details.
Cascade R-CNN and Cascade Mask R-CNN¶
Please refer to Cascade R-CNN for details.
Group Normalization (GN)¶
Please refer to Group Normalization for details.
Weight Standardization¶
Please refer to Weight Standardization for details.
Deformable Convolution v2¶
Please refer to Deformable Convolutional Networks for details.
Instaboost¶
Please refer to Instaboost for details.
Libra R-CNN¶
Please refer to Libra R-CNN for details.
Guided Anchoring¶
Please refer to Guided Anchoring for details.
FreeAnchor¶
Please refer to FreeAnchor for details.
Grid R-CNN (plus)¶
Please refer to Grid R-CNN for details.
Mask Scoring R-CNN¶
Please refer to Mask Scoring R-CNN for details.
Train from Scratch¶
Please refer to Rethinking ImageNet Pre-training for details.
Other datasets¶
We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.
Speed benchmark¶
We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from detectron2).
Implementation | Throughput (img/s) |
---|---|
Detectron2 | 61 |
MMDetection | 60 |
maskrcnn-benchmark | 51 |
tensorpack | 50 |
simpledet | 39 |
Detectron | 19 |
matterport/Mask_RCNN | 14 |
Comparison with Detectron2¶
We compare mmdetection with Detectron2 in terms of speed and performance. We use the commit id 185c27e(30/4/2020) of detectron. For fair comparison, we install and run both frameworks on the same machine.
Hardware¶
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Software environment¶
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
Performance¶
Type | Lr schd | Detectron2 | mmdetection |
---|---|---|---|
Faster R-CNN | 1x | 37.9 | 38.0 |
3x | 40.2 | - | |
Mask R-CNN | 1x | 38.6 & 35.2 | 38.8 & 35.4 |
3x | 41.0 & 37.2 | - | |
Retinanet | 1x | 36.5 | 37.0 |
3x | 37.9 | - |
Training Speed¶
The training speed is measure with s/iter. The lower, the better.
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 0.210 | 0.216 |
Mask R-CNN | 0.261 | 0.265 |
Retinanet | 0.200 | 0.205 |
Inference Speed¶
The inference speed is measured with fps (img/s) on a single GPU, the higher, the better. To be consistent with Detectron2, we report the pure inference speed (without the time of data loading). For Mask R-CNN, we exclude the time of RLE encoding in post-processing. We also include the officially reported speed in the parentheses, which is slightly higher than the results tested on our server due to differences of hardwares.
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 25.6 (26.3) | 22.2 |
Mask R-CNN | 22.5 (23.3) | 19.6 |
Retinanet | 17.8 (18.2) | 20.6 |
Training memory¶
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 3.0 | 3.8 |
Mask R-CNN | 3.4 | 3.9 |
Retinanet | 3.9 | 3.4 |