Shortcuts

mmdet.apis

async mmdet.apis.async_inference_detector(model, imgs)[source]

Async inference image(s) with the detector.

Parameters
  • model (nn.Module) – The loaded detector.

  • img (str | ndarray) – Either image files or loaded images.

Returns

Awaitable detection results.

mmdet.apis.inference_detector(model: torch.nn.modules.module.Module, imgs: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]], test_pipeline: Optional[mmcv.transforms.wrappers.Compose] = None)Union[mmdet.structures.det_data_sample.DetDataSample, List[mmdet.structures.det_data_sample.DetDataSample]][source]

Inference image(s) with the detector.

Parameters
  • model (nn.Module) – The loaded detector.

  • imgs (str, ndarray, Sequence[str/ndarray]) – Either image files or loaded images.

  • test_pipeline (Compose) – Test pipeline.

Returns

If imgs is a list or tuple, the same length list type results will be returned, otherwise return the detection results directly.

Return type

DetDataSample or list[DetDataSample]

mmdet.apis.init_detector(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, palette: str = 'none', device: str = 'cuda:0', cfg_options: Optional[dict] = None)torch.nn.modules.module.Module[source]

Initialize a detector from config file.

Parameters
  • config (str, Path, or mmengine.Config) – Config file path, Path, or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • palette (str) – Color palette used for visualization. If palette is stored in checkpoint, use checkpoint’s palette first, otherwise use externally passed palette. Currently, supports ‘coco’, ‘voc’, ‘citys’ and ‘random’. Defaults to none.

  • device (str) – The device where the anchors will be put on. Defaults to cuda:0.

  • cfg_options (dict, optional) – Options to override some settings in the used config.

Returns

The constructed detector.

Return type

nn.Module

mmdet.datasets

datasets

class mmdet.datasets.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[source]

A sampler wrapper for grouping images with similar aspect ratio (< 1 or.

>= 1) into a same batch.

Parameters
  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch.

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size.

class mmdet.datasets.BaseDetDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Base dataset for detection.

Parameters
  • proposal_file (str, optional) – Proposals file path. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

full_init()None[source]

Load annotation file and set BaseDataset._fully_initialized to True.

If lazy_init=False, full_init will be called during the instantiation and self._fully_initialized will be set to True. If obj._fully_initialized=False, the class method decorated by force_full_init will call full_init automatically.

Several steps to initialize annotation:

  • load_data_list: Load annotations from annotation file.

  • load_proposals: Load proposals from proposal file, if self.proposal_file is not None.

  • filter data information: Filter annotations according to filter_cfg.

  • slice_data: Slice dataset according to self._indices

  • serialize_data: Serialize self.data_list if

self.serialize_data is True.

get_cat_ids(idx: int)List[int][source]

Get COCO category ids by index.

Parameters

idx (int) – Index of data.

Returns

All categories in the image of specified index.

Return type

List[int]

load_proposals()None[source]

Load proposals from proposals file.

The proposals_list should be a dict[img_path: proposals] with the same length as data_list. And the proposals should be a dict or InstanceData usually contains following keys.

  • bboxes (np.ndarry): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • scores (np.ndarry): Classification scores, has a shape (num_instance, ).

class mmdet.datasets.CityscapesDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Dataset for Cityscapes.

filter_data()List[dict][source]

Filter annotations according to filter_cfg.

Returns

Filtered results.

Return type

List[dict]

class mmdet.datasets.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[source]

Sampler that restricts data loading to the label of the dataset.

A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py

Parameters
  • dataset – Dataset used for sampling.

  • seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.

  • num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.

get_cat2imgs()Dict[int, list][source]

Get a dict with class as key and img_ids as values.

Returns

A dict of per-label image list, the item of the dict indicates a label index, corresponds to the image index that contains the label.

Return type

dict[int, list]

set_epoch(epoch: int)None[source]

Sets the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters

epoch (int) – Epoch number.

class mmdet.datasets.CocoDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Dataset for COCO.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCO

filter_data()List[dict][source]

Filter annotations according to filter_cfg.

Returns

Filtered results.

Return type

List[dict]

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

parse_data_info(raw_data_info: dict)Union[dict, List[dict]][source]

Parse raw annotation to target format.

Parameters

raw_data_info (dict) – Raw data information load from ann_file

Returns

Parsed annotation.

Return type

Union[dict, List[dict]]

class mmdet.datasets.CocoPanopticDataset(ann_file: str = '', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'ann': None, 'img': None, 'seg': None}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]

Coco dataset for Panoptic segmentation.

The annotation format is shown as follows. The ann field is optional for testing.

[
    {
        'filename': f'{image_id:012}.png',
        'image_id':9
        'segments_info':
        [
            {
                'id': 8345037, (segment_id in panoptic png,
                                convert from rgb)
                'category_id': 51,
                'iscrowd': 0,
                'bbox': (x1, y1, w, h),
                'area': 24315
            },
            ...
        ]
    },
    ...
]
Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img=None, ann=None, seg=None). The prefix seg which is for panoptic segmentation map must be not None.

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCOPanoptic

filter_data()List[dict][source]

Filter images too small or without ground truth.

Returns

self.data_list after filtering.

Return type

List[dict]

parse_data_info(raw_data_info: dict)dict[source]

Parse raw annotation to target format.

Parameters

raw_data_info (dict) – Raw data information load from ann_file.

Returns

Parsed annotation.

Return type

dict

class mmdet.datasets.CrowdHumanDataset(data_root, ann_file, extra_ann_file=None, **kwargs)[source]

Dataset for CrowdHuman.

Parameters
  • data_root (str) – The root directory for data_prefix and ann_file.

  • ann_file (str) – Annotation file path.

  • extra_ann_file (str | optional) – The path of extra image metas for CrowdHuman. It can be created by CrowdHumanDataset automatically or by tools/misc/get_crowdhuman_id_hw.py manually. Defaults to None.

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

parse_data_info(raw_data_info: dict)Union[dict, List[dict]][source]

Parse raw annotation to target format.

Parameters

raw_data_info (dict) – Raw data information load from ann_file

Returns

Parsed annotation.

Return type

Union[dict, List[dict]]

class mmdet.datasets.DeepFashionDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Dataset for DeepFashion.

class mmdet.datasets.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]

Group Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets but the same group to form batches.

Parameters
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

mmdet.datasets.LVISDataset

alias of mmdet.datasets.lvis.LVISV05Dataset

class mmdet.datasets.LVISV05Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

LVIS v0.5 dataset for detection.

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.LVISV1Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

LVIS v1 dataset for detection.

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, dict], pipeline: Sequence[str], skip_type_keys: Optional[Sequence[str]] = None, max_refetch: int = 15, lazy_init: bool = False)[source]

A wrapper of multiple images mixed dataset.

Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process. At the same time, we provide the dynamic_scale parameter to dynamically change the output image size.

Parameters
  • dataset (CustomDataset) – The dataset to be mixed.

  • pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • dynamic_scale (tuple[int], optional) – The image scale can be changed dynamically. Default to None. It is deprecated.

  • skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Default: 15.

full_init()[source]

Loop to full_init each dataset.

get_data_info(idx: int)dict[source]

Get annotation by index.

Parameters

idx (int) – Global index of ConcatDataset.

Returns

The idx-th annotation of the datasets.

Return type

dict

property metainfo: dict

Get the meta information of the multi-image-mixed dataset.

Returns

The meta information of multi-image-mixed dataset.

Return type

dict

update_skip_type_keys(skip_type_keys)[source]

Update skip_type_keys. It is called by an external hook.

Parameters

skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.

class mmdet.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]

Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets to form batches.

Parameters
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

Examples

>>> dataset_type = 'ConcatDataset'
>>> sub_dataset_type = 'CocoDataset'
>>> data_root = 'data/coco/'
>>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json'
>>> unsup_ann = '../coco_semi_annos/' \
>>>             'instances_train2017.1@10-unlabeled.json'
>>> dataset = dict(type=dataset_type,
>>>     datasets=[
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=sup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=sup_pipeline),
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=unsup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=unsup_pipeline),
>>>         ])
>>>     train_dataloader = dict(
>>>         batch_size=5,
>>>         num_workers=5,
>>>         persistent_workers=True,
>>>         sampler=dict(type='MultiSourceSampler',
>>>             batch_size=5, source_ratio=[1, 4]),
>>>         batch_sampler=None,
>>>         dataset=dataset)
set_epoch(epoch: int)None[source]

Not supported in `epoch-based runner.

class mmdet.datasets.Objects365V1Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Objects365 v1 dataset for detection.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCO

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.Objects365V2Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]

Objects365 v2 dataset for detection.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCO

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.OpenImagesChallengeDataset(ann_file: str, **kwargs)[source]

Open Images Challenge dataset for detection.

Parameters

ann_file (str) – Open Images Challenge box annotation in txt format.

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.OpenImagesDataset(label_file: str, meta_file: str, hierarchy_file: str, image_level_ann_file: Optional[str] = None, **kwargs)[source]

Open Images dataset for detection.

Parameters
  • ann_file (str) – Annotation file path.

  • label_file (str) – File path of the label description file that maps the classes names in MID format to their short descriptions.

  • meta_file (str) – File path to get image metas.

  • hierarchy_file (str) – The file path of the class hierarchy.

  • image_level_ann_file (str) – Human-verified image level annotation, which is used in evaluation.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

load_data_list()List[dict][source]

Load annotations from an annotation file named as self.ann_file

Returns

A list of annotation.

Return type

List[dict]

class mmdet.datasets.VOCDataset(**kwargs)[source]

Dataset for PASCAL VOC.

class mmdet.datasets.WIDERFaceDataset(**kwargs)[source]

Reader for the WIDER Face dataset in PASCAL VOC format.

Conversion scripts can be found in https://github.com/sovrasov/wider-face-pascal-voc-annotations

load_annotations(ann_file)[source]

Load annotation from WIDERFace XML style annotation file.

Parameters

ann_file (str) – Path of XML file.

Returns

Annotation info from XML file.

Return type

list[dict]

class mmdet.datasets.XMLDataset(img_subdir: str = 'JPEGImages', ann_subdir: str = 'Annotations', **kwargs)[source]

XML dataset for detection.

Parameters
  • img_subdir (str) – Subdir where images are stored. Default: JPEGImages.

  • ann_subdir (str) – Subdir where annotations are. Default: Annotations.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

property bbox_min_size: Optional[str]

Return the minimum size of bounding boxes in the images.

filter_data()List[dict][source]

Filter annotations according to filter_cfg.

Returns

Filtered results.

Return type

List[dict]

load_data_list()List[dict][source]

Load annotation from XML style ann_file.

Returns

Annotation info from XML file.

Return type

list[dict]

parse_data_info(img_info: dict)Union[dict, List[dict]][source]

Parse raw annotation to target format.

Parameters

img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.

Returns

Parsed annotation.

Return type

Union[dict, List[dict]]

property sub_data_root: str

Return the sub data root.

mmdet.datasets.get_loading_pipeline(pipeline)[source]

Only keep loading image and annotations related configuration.

Parameters

pipeline (list[dict]) – Data pipeline configs.

Returns

The new pipeline list with only keep

loading image and annotations related configuration.

Return type

list[dict]

Examples

>>> pipelines = [
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations', with_bbox=True),
...    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
...    dict(type='RandomFlip', flip_ratio=0.5),
...    dict(type='Normalize', **img_norm_cfg),
...    dict(type='Pad', size_divisor=32),
...    dict(type='DefaultFormatBundle'),
...    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
...    ]
>>> expected_pipelines = [
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations', with_bbox=True)
...    ]
>>> assert expected_pipelines ==        ...        get_loading_pipeline(pipelines)

api_wrappers

class mmdet.datasets.api_wrappers.COCO(*args: Any, **kwargs: Any)[source]

This class is almost the same as official pycocotools package.

It implements some snake case function aliases. So that the COCO class has the same interface as LVIS class.

class mmdet.datasets.api_wrappers.COCOPanoptic(*args: Any, **kwargs: Any)[source]

This wrapper is for loading the panoptic style annotation file.

The format is shown in the CocoPanopticDataset class.

Parameters

annotation_file (str, optional) – Path of annotation file. Defaults to None.

createIndex()None[source]

Create index.

load_anns(ids: Union[List[int], int] = [])Optional[List[dict]][source]

Load anns with the specified ids.

self.anns is a list of annotation lists instead of a list of annotations.

Parameters

ids (Union[List[int], int]) – Integer ids specifying anns.

Returns

Loaded ann objects.

Return type

anns (List[dict], optional)

samplers

class mmdet.datasets.samplers.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[source]

A sampler wrapper for grouping images with similar aspect ratio (< 1 or.

>= 1) into a same batch.

Parameters
  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch.

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size.

class mmdet.datasets.samplers.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[source]

Sampler that restricts data loading to the label of the dataset.

A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py

Parameters
  • dataset – Dataset used for sampling.

  • seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.

  • num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.

get_cat2imgs()Dict[int, list][source]

Get a dict with class as key and img_ids as values.

Returns

A dict of per-label image list, the item of the dict indicates a label index, corresponds to the image index that contains the label.

Return type

dict[int, list]

set_epoch(epoch: int)None[source]

Sets the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters

epoch (int) – Epoch number.

class mmdet.datasets.samplers.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]

Group Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets but the same group to form batches.

Parameters
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

class mmdet.datasets.samplers.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]

Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets to form batches.

Parameters
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

Examples

>>> dataset_type = 'ConcatDataset'
>>> sub_dataset_type = 'CocoDataset'
>>> data_root = 'data/coco/'
>>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json'
>>> unsup_ann = '../coco_semi_annos/' \
>>>             'instances_train2017.1@10-unlabeled.json'
>>> dataset = dict(type=dataset_type,
>>>     datasets=[
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=sup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=sup_pipeline),
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=unsup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=unsup_pipeline),
>>>         ])
>>>     train_dataloader = dict(
>>>         batch_size=5,
>>>         num_workers=5,
>>>         persistent_workers=True,
>>>         sampler=dict(type='MultiSourceSampler',
>>>             batch_size=5, source_ratio=[1, 4]),
>>>         batch_sampler=None,
>>>         dataset=dataset)
set_epoch(epoch: int)None[source]

Not supported in `epoch-based runner.

transforms

class mmdet.datasets.transforms.Albu(transforms: List[dict], bbox_params: Optional[dict] = None, keymap: Optional[dict] = None, skip_img_without_anno: bool = False)[source]

Albumentation augmentation.

Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

Modified Keys:

  • img (np.uint8)

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • img_shape (tuple)

An example of transforms is as followed:

[
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=0,
        interpolation=1,
        p=0.5),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]
Parameters
  • transforms (list[dict]) – A list of albu transformations

  • bbox_params (dict, optional) – Bbox_params for albumentation Compose

  • keymap (dict, optional) – Contains {‘input key’:’albumentation-style key’}

  • skip_img_without_anno (bool) – Whether to skip the image if no ann left after aug. Defaults to False.

albu_builder(cfg: dict)None[source]

Import a module from albumentations.

It inherits some of build_from_cfg() logic.

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.

Returns

The constructed object.

Return type

obj

static mapper(d: dict, keymap: dict)dict[source]

Dictionary mapper. Renames keys according to keymap provided.

Parameters
  • d (dict) – old dict

  • keymap (dict) – {‘old_key’:’new_key’}

Returns

new dict.

Return type

dict

class mmdet.datasets.transforms.AutoAugment(policies: List[List[Union[dict, mmengine.config.config.ConfigDict]]] = [[{'type': 'Equalize', 'prob': 0.8, 'level': 1}, {'type': 'ShearY', 'prob': 0.8, 'level': 4}], [{'type': 'Color', 'prob': 0.4, 'level': 9}, {'type': 'Equalize', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 1}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Solarize', 'prob': 0.8, 'level': 3}, {'type': 'Equalize', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.4, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 2}], [{'type': 'Color', 'prob': 0.2, 'level': 0}, {'type': 'Equalize', 'prob': 0.8, 'level': 8}], [{'type': 'Equalize', 'prob': 0.4, 'level': 8}, {'type': 'SolarizeAdd', 'prob': 0.8, 'level': 3}], [{'type': 'ShearX', 'prob': 0.2, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Color', 'prob': 0.6, 'level': 1}, {'type': 'Equalize', 'prob': 1.0, 'level': 2}], [{'type': 'Invert', 'prob': 0.4, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 0}], [{'type': 'Equalize', 'prob': 1.0, 'level': 9}, {'type': 'ShearY', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 7}, {'type': 'Equalize', 'prob': 0.6, 'level': 0}], [{'type': 'Posterize', 'prob': 0.4, 'level': 6}, {'type': 'AutoContrast', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Color', 'prob': 0.6, 'level': 9}], [{'type': 'Solarize', 'prob': 0.2, 'level': 4}, {'type': 'Rotate', 'prob': 0.8, 'level': 9}], [{'type': 'Rotate', 'prob': 1.0, 'level': 7}, {'type': 'TranslateY', 'prob': 0.8, 'level': 9}], [{'type': 'ShearX', 'prob': 0.0, 'level': 0}, {'type': 'Solarize', 'prob': 0.8, 'level': 4}], [{'type': 'ShearY', 'prob': 0.8, 'level': 0}, {'type': 'Color', 'prob': 0.6, 'level': 4}], [{'type': 'Color', 'prob': 1.0, 'level': 0}, {'type': 'Rotate', 'prob': 0.6, 'level': 2}], [{'type': 'Equalize', 'prob': 0.8, 'level': 4}, {'type': 'Equalize', 'prob': 0.0, 'level': 8}], [{'type': 'Equalize', 'prob': 1.0, 'level': 4}, {'type': 'AutoContrast', 'prob': 0.6, 'level': 2}], [{'type': 'ShearY', 'prob': 0.4, 'level': 7}, {'type': 'SolarizeAdd', 'prob': 0.6, 'level': 7}], [{'type': 'Posterize', 'prob': 0.8, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 10}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Equalize', 'prob': 0.6, 'level': 1}], [{'type': 'Color', 'prob': 0.8, 'level': 6}, {'type': 'Rotate', 'prob': 0.4, 'level': 5}]], prob: Optional[List[float]] = None)[source]

Auto augmentation.

This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data and in Learning Data Augmentation Strategies for Object Detection.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • policies (List[List[Union[dict, ConfigDict]]]) – The policies of auto augmentation.Each policy in policies is a specific augmentation policy, and is composed by several augmentations. When AutoAugment is called, a random policy in policies will be selected to augment images. Defaults to policy_v0().

  • prob (list[float], optional) – The probabilities associated with each policy. The length should be equal to the policy number and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.

Examples

>>> policies = [
>>>     [
>>>         dict(type='Sharpness', prob=0.0, level=8),
>>>         dict(type='ShearX', prob=0.4, level=0,)
>>>     ],
>>>     [
>>>         dict(type='Rotate', prob=0.6, level=10),
>>>         dict(type='Color', prob=1.0, level=6)
>>>     ]
>>> ]
>>> augmentation = AutoAugment(policies)
>>> img = np.ones(100, 100, 3)
>>> gt_bboxes = np.ones(10, 4)
>>> results = dict(img=img, gt_bboxes=gt_bboxes)
>>> results = augmentation(results)
class mmdet.datasets.transforms.AutoContrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Auto adjust image contrast.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing AutoContrast should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – No use for AutoContrast transformation. Defaults to None.

  • min_mag (float) – No use for AutoContrast transformation. Defaults to 0.1.

  • max_mag (float) – No use for AutoContrast transformation. Defaults to 1.9.

class mmdet.datasets.transforms.Brightness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Adjust the brightness of the image. A magnitude=0 gives a black image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Brightness transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Brightness transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Brightness transformation. Defaults to 1.9.

class mmdet.datasets.transforms.CachedMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True, max_cached_images: int = 20, random_pop: bool = True, prob: float = 1.0)[source]

Cached mixup data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The cached mixup transform steps are as follows:

   1. Append the results from the last transform into the cache.
   2. Another random image is picked from the cache and embedded in
      the top left patch(after padding and resizing)
   3. The target of mixup transform is the weighted average of mixup
      image and origin image.

Required Keys:

  • img

  • gt_bboxes (np.float32) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

class mmdet.datasets.transforms.CachedMosaic(*args, max_cached_images: int = 40, random_pop: bool = True, **kwargs)[source]

Cached mosaic augmentation.

Cached mosaic transform will random select images from the cache and combine them into one output image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The cached mosaic transform steps are as follows:

    1. Append the results from the last transform into the cache.
    2. Choose the mosaic center as the intersections of 4 images
    3. Get the left top image according to the index, and randomly
       sample another 3 images from the result cache.
    4. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (np.float32) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

class mmdet.datasets.transforms.Color(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Adjust the color balance of the image, in a manner similar to the controls on a colour TV set. A magnitude=0 gives a black & white image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Color transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Color transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Color transformation. Defaults to 1.9.

class mmdet.datasets.transforms.ColorTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Base class for color transformations. All color transformations need to inherit from this base class. ColorTransform unifies the class attributes and class functions of color transformations (Color, Brightness, Contrast, Sharpness, Solarize, SolarizeAdd, Equalize, AutoContrast, Invert, and Posterize), and only distort color channels, without impacting the locations of the instances.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for color transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for color transformation. Defaults to 1.9.

transform(results: dict)dict[source]

Transform function for images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Transformed results.

Return type

dict

class mmdet.datasets.transforms.Contrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Control the contrast of the image. A magnitude=0 gives a gray image, whereas magnitude=1 gives the original imageThe bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Contrast transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Contrast transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Contrast transformation. Defaults to 1.9.

class mmdet.datasets.transforms.CopyPaste(max_num_pasted: int = 100, bbox_occluded_thr: int = 10, mask_occluded_thr: int = 300, selected: bool = True)[source]

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation The simple copy-paste transform steps are as follows:

  1. The destination image is already resized with aspect ratio kept, cropped and padded.

  2. Randomly select a source image, which is also already resized with aspect ratio kept, cropped and padded in a similar way as the destination image.

  3. Randomly select some objects from the source image.

  4. Paste these source objects to the destination image directly, due to the source and destination image have the same size.

  5. Update object masks of the destination image, for some origin objects may be occluded.

  6. Generate bboxes from the updated destination masks and filter some objects which are totally occluded, and adjust bboxes which are partly occluded.

  7. Append selected source bboxes, masks, and labels.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (BitmapMasks) (optional)

Modified Keys:

  • img

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • gt_masks (optional)

Parameters
  • max_num_pasted (int) – The maximum number of pasted objects. Defaults to 100.

  • bbox_occluded_thr (int) – The threshold of occluded bbox. Defaults to 10.

  • mask_occluded_thr (int) – The threshold of occluded mask. Defaults to 300.

  • selected (bool) – Whether select objects or not. If select is False, all objects of the source image will be pasted to the destination image. Defaults to True.

class mmdet.datasets.transforms.CutOut(n_holes: Union[int, Tuple[int, int]], cutout_shape: Optional[Union[Tuple[int, int], List[Tuple[int, int]]]] = None, cutout_ratio: Optional[Union[Tuple[float, float], List[Tuple[float, float]]]] = None, fill_in: Union[Tuple[float, float, float], Tuple[int, int, int]] = (0, 0, 0))[source]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • n_holes (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] or list[tuple[int, int]], optional) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list. Defaults to None.

  • (tuple[float (cutout_ratio) – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

  • or list[tuple[float (float]) – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

  • float]] – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

:paramoptional): The candidate ratio of dropped regions. It can be

tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

Parameters

fill_in (tuple[float, float, float] or tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Defaults to (0, 0, 0).

class mmdet.datasets.transforms.Equalize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Equalize the image histogram. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Equalize transformation. Defaults to 1.0.

  • level (int, optional) – No use for Equalize transformation. Defaults to None.

  • min_mag (float) – No use for Equalize transformation. Defaults to 0.1.

  • max_mag (float) – No use for Equalize transformation. Defaults to 1.9.

class mmdet.datasets.transforms.Expand(mean: Sequence[Union[int, float]] = (0, 0, 0), to_rgb: bool = True, ratio_range: Sequence[Union[int, float]] = (1, 4), seg_ignore_label: Optional[int] = None, prob: float = 0.5)[source]

Random expand the image & bboxes & masks & segmentation map.

Randomly place the original image on a canvas of ratio x original image size filled with mean values. The ratio is in the range of ratio_range.

Required Keys:

  • img

  • img_shape

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Parameters
  • mean (sequence) – mean value of dataset.

  • to_rgb (bool) – if need to convert the order of mean to align with RGB.

  • ratio_range (sequence)) – range of expand ratio.

  • seg_ignore_label (int) – label of ignore segmentation map.

  • prob (float) – probability of applying this transformation

class mmdet.datasets.transforms.FilterAnnotations(min_gt_bbox_wh: Tuple[int, int] = (1, 1), min_gt_mask_area: int = 1, by_box: bool = True, by_mask: bool = False, keep_empty: bool = True)[source]

Filter invalid annotations.

Required Keys:

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_masks (optional)

  • gt_ignore_flags (optional)

Parameters
  • min_gt_bbox_wh (tuple[float]) – Minimum width and height of ground truth boxes. Default: (1., 1.)

  • min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1

  • by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True

  • by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False

  • keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Defaults to True.

class mmdet.datasets.transforms.FixShapeResize(width: int, height: int, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation: str = 'bilinear')[source]

Resize images & bbox & seg to the specified size.

This transform resizes the input image according to width and height. Bboxes, masks, and seg map are then resized with the same parameters.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

  • homography_matrix

Parameters
  • width (int) – width for resizing.

  • height (int) – height for resizing. Defaults to None.

  • pad_val (Number | dict[str, Number], optional) –

    Padding value for if the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:

    • img: The value to pad the image.

    • seg: The value to pad the semantic segmentation map.

    Defaults to dict(img=0, seg=255).

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.GeomTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 1.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Base class for geometric transformations. All geometric transformations need to inherit from this base class. GeomTransform unifies the class attributes and class functions of geometric transformations (ShearX, ShearY, Rotate, TranslateX, and TranslateY), and records the homography matrix.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for geometric transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for geometric transformation. Defaults to 1.0.

  • reversal_prob (float) – The probability that reverses the geometric transformation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.ImageToTensor(keys)[source]

Convert image to torch.Tensor by given keys.

The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).

Parameters

keys (Sequence[str]) – Key of images to be converted to Tensor.

class mmdet.datasets.transforms.InstaBoost(action_candidate: tuple = ('normal', 'horizontal', 'skip'), action_prob: tuple = (1, 0, 0), scale: tuple = (0.8, 1.2), dx: int = 15, dy: int = 15, theta: tuple = (- 1, 1), color_prob: float = 0.5, hflag: bool = False, aug_ratio: float = 0.5)[source]

Data augmentation method in InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting.

Refer to https://github.com/GothicAi/Instaboost for implementation details.

Required Keys:

  • img (np.uint8)

  • instances

Modified Keys:

  • img (np.uint8)

  • instances

Parameters
  • action_candidate (tuple) – Action candidates. “normal”, “horizontal”, “vertical”, “skip” are supported. Defaults to (‘normal’, ‘horizontal’, ‘skip’).

  • action_prob (tuple) – Corresponding action probabilities. Should be the same length as action_candidate. Defaults to (1, 0, 0).

  • scale (tuple) – (min scale, max scale). Defaults to (0.8, 1.2).

  • dx (int) – The maximum x-axis shift will be (instance width) / dx. Defaults to 15.

  • dy (int) – The maximum y-axis shift will be (instance height) / dy. Defaults to 15.

  • theta (tuple) – (min rotation degree, max rotation degree). Defaults to (-1, 1).

  • color_prob (float) – Probability of images for color augmentation. Defaults to 0.5.

  • hflag (bool) – Whether to use heatmap guided. Defaults to False.

  • aug_ratio (float) – Probability of applying this transformation. Defaults to 0.5.

transform(results)dict[source]

The transform function.

class mmdet.datasets.transforms.Invert(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Invert images.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – No use for Invert transformation. Defaults to None.

  • min_mag (float) – No use for Invert transformation. Defaults to 0.1.

  • max_mag (float) – No use for Invert transformation. Defaults to 1.9.

class mmdet.datasets.transforms.LoadAnnotations(with_mask: bool = False, poly2mask: bool = True, box_type: str = 'hbox', **kwargs)[source]

Load and process the instances and seg_map annotation provided by dataset.

The annotation format is as the following:

{
    'instances':
    [
        {
        # List of 4 numbers representing the bounding box of the
        # instance, in (x1, y1, x2, y2) order.
        'bbox': [x1, y1, x2, y2],

        # Label of image classification.
        'bbox_label': 1,

        # Used in instance/panoptic segmentation. The segmentation mask
        # of the instance or the information of segments.
        # 1. If list[list[float]], it represents a list of polygons,
        # one for each connected component of the object. Each
        # list[float] is one simple polygon in the format of
        # [x1, y1, ..., xn, yn] (n≥3). The Xs and Ys are absolute
        # coordinates in unit of pixels.
        # 2. If dict, it represents the per-pixel segmentation mask in
        # COCO’s compressed RLE format. The dict should have keys
        # “size” and “counts”.  Can be loaded by pycocotools
        'mask': list[list[float]] or dict,

        }
    ]
    # Filename of semantic or panoptic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # In (x1, y1, x2, y2) order, float type. N is the number of bboxes
    # in an image
    'gt_bboxes': BaseBoxes(N, 4)
     # In int type.
    'gt_bboxes_labels': np.ndarray(N, )
     # In built-in class
    'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W)
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
     # in (x, y, v) order, float type.
}

Required Keys:

  • height

  • width

  • instances

    • bbox (optional)

    • bbox_label

    • mask (optional)

    • ignore_flag

  • seg_map_path (optional)

Added Keys:

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_masks (BitmapMasks | PolygonMasks)

  • gt_seg_map (np.uint8)

  • gt_ignore_flags (bool)

Parameters
  • with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.

  • with_label (bool) – Whether to parse and load the label annotation. Defaults to True.

  • with_mask (bool) – Whether to parse and load the mask annotation. Default: False.

  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • poly2mask (bool) – Whether to convert mask to bitmap. Default: True.

  • box_type (str) – The box type used to wrap the bboxes. If box_type is None, gt_bboxes will keep being np.ndarray. Defaults to ‘hbox’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See :class:mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[source]

Function to load multiple types annotations.

Parameters

results (dict) – Result dict from :obj:mmengine.BaseDataset.

Returns

The dict contains loaded bounding box, label and semantic segmentation.

Return type

dict

class mmdet.datasets.transforms.LoadEmptyAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = False, with_seg: bool = False, seg_ignore_label: int = 255)[source]

Load Empty Annotations for unlabeled images.

Added Keys: - gt_bboxes (np.float32) - gt_bboxes_labels (np.int64) - gt_masks (BitmapMasks | PolygonMasks) - gt_seg_map (np.uint8) - gt_ignore_flags (bool)

Parameters
  • with_bbox (bool) – Whether to load the pseudo bbox annotation. Defaults to True.

  • with_label (bool) – Whether to load the pseudo label annotation. Defaults to True.

  • with_mask (bool) – Whether to load the pseudo mask annotation. Default: False.

  • with_seg (bool) – Whether to load the pseudo semantic segmentation annotation. Defaults to False.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

transform(results: dict)dict[source]

Transform function to load empty annotations.

Parameters

results (dict) – Result dict.

Returns

Updated result dict.

Return type

dict

class mmdet.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'}, ignore_empty: bool = False)[source]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[source]

Transform function to add image meta information.

Parameters

results (dict) – Result dict with Webcam read image in results['img'].

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmdet.datasets.transforms.LoadMultiChannelImageFromFiles(to_float32: bool = False, color_type: str = 'unchanged', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[source]

Load multi-channel images from a list of separate channel files.

Required Keys:

  • img_path

Modified Keys:

  • img

  • img_shape

  • ori_shape

Parameters
  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘unchanged’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[source]

Transform functions to load multiple images and get images meta information.

Parameters

results (dict) – Result dict from mmdet.CustomDataset.

Returns

The dict contains loaded images and meta information.

Return type

dict

class mmdet.datasets.transforms.LoadPanopticAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = True, with_seg: bool = True, box_type: str = 'hbox', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[source]

Load multiple types of panoptic annotations.

The annotation format is as the following:

{
    'instances':
    [
        {
        # List of 4 numbers representing the bounding box of the
        # instance, in (x1, y1, x2, y2) order.
        'bbox': [x1, y1, x2, y2],

        # Label of image classification.
        'bbox_label': 1,
        },
        ...
    ]
    'segments_info':
    [
        {
        # id = cls_id + instance_id * INSTANCE_OFFSET
        'id': int,

        # Contiguous category id defined in dataset.
        'category': int

        # Thing flag.
        'is_thing': bool
        },
        ...
    ]

    # Filename of semantic or panoptic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # In (x1, y1, x2, y2) order, float type. N is the number of bboxes
    # in an image
    'gt_bboxes': BaseBoxes(N, 4)
     # In int type.
    'gt_bboxes_labels': np.ndarray(N, )
     # In built-in class
    'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W)
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
     # in (x, y, v) order, float type.
}

Required Keys:

  • height

  • width

  • instances - bbox - bbox_label - ignore_flag

  • segments_info - id - category - is_thing

  • seg_map_path

Added Keys:

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_masks (BitmapMasks | PolygonMasks)

  • gt_seg_map (np.uint8)

  • gt_ignore_flags (bool)

Parameters
  • with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.

  • with_label (bool) – Whether to parse and load the label annotation. Defaults to True.

  • with_mask (bool) – Whether to parse and load the mask annotation. Defaults to True.

  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • box_type (str) – The box mode used to wrap the bboxes.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See :class:mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[source]

Function to load multiple types panoptic annotations.

Parameters

results (dict) – Result dict from :obj:mmdet.CustomDataset.

Returns

The dict contains loaded bounding box, label, mask and

semantic segmentation annotations.

Return type

dict

class mmdet.datasets.transforms.LoadProposals(num_max_proposals: Optional[int] = None)[source]

Load proposal pipeline.

Required Keys:

  • proposals

Modified Keys:

  • proposals

Parameters

num_max_proposals (int, optional) – Maximum number of proposals to load. If not specified, all proposals will be loaded.

transform(results: dict)dict[source]

Transform function to load proposals from file.

Parameters

results (dict) – Result dict from mmdet.CustomDataset.

Returns

The dict contains loaded proposal annotations.

Return type

dict

class mmdet.datasets.transforms.MinIoURandomCrop(min_ious: Sequence[float] = (0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size: float = 0.3, bbox_clip_border: bool = True)[source]

Random crop the image & bboxes & masks & segmentation map, the cropped patches have minimum IoU requirement with original image & bboxes & masks.

& segmentation map, the IoU threshold is randomly selected from min_ious.

Required Keys:

  • img

  • img_shape

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

Parameters
  • min_ious (Sequence[float]) – minimum IoU threshold for all intersections with bounding boxes.

  • min_crop_size (float) – minimum crop’s size (i.e. h,w := a*h, a*w,

  • a >= min_crop_size) (where) –

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

class mmdet.datasets.transforms.MixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True)[source]

MixUp data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The mixup transform steps are as follows:

   1. Another random image is picked by dataset and embedded in
      the top left patch(after padding and resizing)
   2. The target of mixup transform is the weighted average of mixup
      image and origin image.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

class mmdet.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, prob: float = 1.0)[source]

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

class mmdet.datasets.transforms.MultiBranch(branch_field: List[str], **branch_pipelines: dict)[source]

Multiple branch pipeline wrapper.

Generate multiple data-augmented versions of the same image. MultiBranch needs to specify the branch names of all pipelines of the dataset, perform corresponding data augmentation for the current branch, and return None for other branches, which ensures the consistency of return format across different samples.

Parameters
  • branch_field (list) – List of branch names.

  • branch_pipelines (dict) – Dict of different pipeline configs to be composed.

Examples

>>> branch_field = ['sup', 'unsup_teacher', 'unsup_student']
>>> sup_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=0.5),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> weak_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=0.0),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> strong_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=1.0),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> unsup_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=file_client_args),
>>>     dict(type='LoadEmptyAnnotations'),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         unsup_teacher=weak_pipeline,
>>>         unsup_student=strong_pipeline)
>>>     ]
>>> from mmcv.transforms import Compose
>>> sup_branch = Compose(sup_pipeline)
>>> unsup_branch = Compose(unsup_pipeline)
>>> print(sup_branch)
>>> Compose(
>>>     LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     Resize(scale=(1333, 800), scale_factor=None, keep_ratio=True, clip_object_border=True), backend=cv2), interpolation=bilinear) # noqa
>>>     RandomFlip(prob=0.5, direction=horizontal)
>>>     MultiBranch(branch_pipelines=['sup'])
>>> )
>>> print(unsup_branch)
>>> Compose(
>>>     LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     LoadEmptyAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, seg_ignore_label=255) # noqa
>>>     MultiBranch(branch_pipelines=['unsup_teacher', 'unsup_student'])
>>> )
transform(results: dict)dict[source]

Transform function to apply transforms sequentially.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

  • ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of

    models from different branches.

  • ’data_sample’ (Dict[str,obj:DetDataSample]): The annotation

    info of the sample from different branches.

Return type

dict

class mmdet.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]

Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_id: id of the image

  • img_path: path to the image file

  • ori_shape: original shape of the image as a tuple (h, w)

  • img_shape: shape of the image input to the network as a tuple (h, w). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: ('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:DetDataSample): The annotation info of the

    sample.

Return type

dict

class mmdet.datasets.transforms.Pad(size: Optional[Tuple[int, int]] = None, size_divisor: Optional[int] = None, pad_to_square: bool = False, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, padding_mode: str = 'constant')[source]

Pad the image & segmentation map.

There are three padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. and (3)pad to square. Also, pad to square and pad to the minimum size can be used as the same time.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_masks

  • gt_seg_map

Added Keys:

  • pad_shape

  • pad_fixed_size

  • pad_size_divisor

Parameters
  • size (tuple, optional) – Fixed padding size. Expected padding shape (width, height). Defaults to None.

  • size_divisor (int, optional) – The divisor of padded size. Defaults to None.

  • pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Defaults to False.

  • pad_val (Number | dict[str, Number], optional) –

    the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:

    • img: The value to pad the image.

    • seg: The value to pad the semantic segmentation map.

    Defaults to dict(img=0, seg=255).

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Defaults to ‘constant’.

    • constant: pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

transform(results: dict)dict[source]

Call function to pad images, masks, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmdet.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[Union[int, float]] = (0.5, 1.5), saturation_range: Sequence[Union[int, float]] = (0.5, 1.5), hue_delta: int = 18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

  8. randomly swap channels

Required Keys:

  • img (np.uint8)

Modified Keys:

  • img (np.float32)

Parameters
  • brightness_delta (int) – delta of brightness.

  • contrast_range (sequence) – range of contrast.

  • saturation_range (sequence) – range of saturation.

  • hue_delta (int) – delta of hue.

transform(results: dict)dict[source]

Transform function to perform photometric distortion on images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with images distorted.

Return type

dict

class mmdet.datasets.transforms.Posterize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 4.0)[source]

Posterize images (reduce the number of bits for each color channel).

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Posterize transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Posterize transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for Posterize transformation. Defaults to 4.0.

class mmdet.datasets.transforms.ProposalBroadcaster(transforms: List[Union[dict, Callable]] = [])[source]

A transform wrapper to apply the wrapped transforms to process both gt_bboxes and proposals without adding any codes. It will do the following steps:

  1. Scatter the broadcasting targets to a list of inputs of the wrapped transforms. The type of the list should be list[dict, dict], which the first is the original inputs, the second is the processing results that gt_bboxes being rewritten by the proposals.

  2. Apply self.transforms, with same random parameters, which is sharing with a context manager. The type of the outputs is a list[dict, dict].

  3. Gather the outputs, update the proposals in the first item of the outputs with the gt_bboxes in the second .

Parameters

transforms (list, optional) – Sequence of transform object or config dict to be wrapped. Defaults to [].

Note: The TransformBroadcaster in MMCV can achieve the same operation as

ProposalBroadcaster, but need to set more complex parameters.

Examples

>>> pipeline = [
>>>     dict(type='LoadImageFromFile'),
>>>     dict(type='LoadProposals', num_max_proposals=2000),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(
>>>         type='ProposalBroadcaster',
>>>         transforms=[
>>>             dict(type='Resize', scale=(1333, 800),
>>>                  keep_ratio=True),
>>>             dict(type='RandomFlip', prob=0.5),
>>>         ]),
>>>     dict(type='PackDetInputs')]
transform(results: dict)dict[source]

Apply wrapped transform functions to process both gt_bboxes and proposals.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmdet.datasets.transforms.RandAugment(aug_space: List[Union[dict, mmengine.config.config.ConfigDict]] = [[{'type': 'AutoContrast'}], [{'type': 'Equalize'}], [{'type': 'Invert'}], [{'type': 'Rotate'}], [{'type': 'Posterize'}], [{'type': 'Solarize'}], [{'type': 'SolarizeAdd'}], [{'type': 'Color'}], [{'type': 'Contrast'}], [{'type': 'Brightness'}], [{'type': 'Sharpness'}], [{'type': 'ShearX'}], [{'type': 'ShearY'}], [{'type': 'TranslateX'}], [{'type': 'TranslateY'}]], aug_num: int = 2, prob: Optional[List[float]] = None)[source]

Rand augmentation.

This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • aug_space (List[List[Union[dict, ConfigDict]]]) – The augmentation space of rand augmentation. Each augmentation transform in aug_space is a specific transform, and is composed by several augmentations. When RandAugment is called, a random transform in aug_space will be selected to augment images. Defaults to aug_space.

  • aug_num (int) – Number of augmentation to apply equentially. Defaults to 2.

  • prob (list[float], optional) – The probabilities associated with each augmentation. The length should be equal to the augmentation space and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.

Examples

>>> aug_space = [
>>>     dict(type='Sharpness'),
>>>     dict(type='ShearX'),
>>>     dict(type='Color'),
>>>     ],
>>> augmentation = RandAugment(aug_space)
>>> img = np.ones(100, 100, 3)
>>> gt_bboxes = np.ones(10, 4)
>>> results = dict(img=img, gt_bboxes=gt_bboxes)
>>> results = augmentation(results)
transform(results: dict)dict[source]

Transform function to use RandAugment.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with RandAugment.

Return type

dict

class mmdet.datasets.transforms.RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True)[source]

Random affine transform data augmentation.

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.

  • max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.

  • scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).

  • max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.

  • border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).

  • border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

class mmdet.datasets.transforms.RandomCenterCropPad(crop_size: Optional[tuple] = None, ratios: Optional[tuple] = (0.9, 1.0, 1.1), border: Optional[int] = 128, mean: Optional[Sequence] = None, std: Optional[Sequence] = None, to_rgb: Optional[bool] = None, test_mode: bool = False, test_pad_mode: Optional[tuple] = ('logical_or', 127), test_pad_add_pix: int = 0, bbox_clip_border: bool = True)[source]

Random center crop and random around padding for CornerNet.

This operation generates randomly cropped image from the original image and pads it simultaneously. Different from RandomCrop, the output shape may not equal to crop_size strictly. We choose a random value from ratios and the output shape could be larger or smaller than crop_size. The padding operation is also different from Pad, here we use around padding instead of right-bottom padding.

The relation between output image (padding image) and original image:

                output image

       +----------------------------+
       |          padded area       |
+------|----------------------------|----------+
|      |         cropped area       |          |
|      |         +---------------+  |          |
|      |         |    .   center |  |          | original image
|      |         |        range  |  |          |
|      |         +---------------+  |          |
+------|----------------------------|----------+
       |          padded area       |
       +----------------------------+

There are 5 main areas in the figure:

  • output image: output image of this operation, also called padding image in following instruction.

  • original image: input image of this operation.

  • padded area: non-intersect area of output image and original image.

  • cropped area: the overlap of output image and original image.

  • center range: a smaller area where random center chosen from. center range is computed by border and original image’s shape to avoid our random center is too close to original image’s border.

Also this operation act differently in train and test mode, the summary pipeline is listed below.

Train pipeline:

  1. Choose a random_ratio from ratios, the shape of padding image will be random_ratio * crop_size.

  2. Choose a random_center in center range.

  3. Generate padding image with center matches the random_center.

  4. Initialize the padding image with pixel value equals to mean.

  5. Copy the cropped area to padding image.

  6. Refine annotations.

Test pipeline:

  1. Compute output shape according to test_pad_mode.

  2. Generate padding image with center matches the original image center.

  3. Initialize the padding image with pixel value equals to mean.

  4. Copy the cropped area to padding image.

Required Keys:

  • img (np.float32)

  • img_shape (tuple)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img (np.float32)

  • img_shape (tuple)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Parameters
  • crop_size (tuple, optional) – expected size after crop, final size will computed according to ratio. Requires (width, height) in train mode, and None in test mode.

  • ratios (tuple, optional) – random select a ratio from tuple and crop image to (crop_size[0] * ratio) * (crop_size[1] * ratio). Only available in train mode. Defaults to (0.9, 1.0, 1.1).

  • border (int, optional) – max distance from center select area to image border. Only available in train mode. Defaults to 128.

  • mean (sequence, optional) – Mean values of 3 channels.

  • std (sequence, optional) – Std values of 3 channels.

  • to_rgb (bool, optional) – Whether to convert the image from BGR to RGB.

  • test_mode (bool) – whether involve random variables in transform. In train mode, crop_size is fixed, center coords and ratio is random selected from predefined lists. In test mode, crop_size is image’s original shape, center coords and ratio is fixed. Defaults to False.

  • test_pad_mode (tuple, optional) –

    padding method and padding shape value, only available in test mode. Default is using ‘logical_or’ with 127 as padding shape value.

    • ’logical_or’: final_shape = input_shape | padding_shape_value

    • ’size_divisor’: final_shape = int( ceil(input_shape / padding_shape_value) * padding_shape_value)

    Defaults to (‘logical_or’, 127).

  • test_pad_add_pix (int) – Extra padding pixel in test mode. Defaults to 0.

  • bbox_clip_border (bool) – Whether clip the objects outside the border of the image. Defaults to True.

class mmdet.datasets.transforms.RandomCrop(crop_size: tuple, crop_type: str = 'absolute', allow_negative_crop: bool = False, recompute_bbox: bool = False, bbox_clip_border: bool = True)[source]

Random crop the image & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_masks (optional)

  • gt_ignore_flags (optional)

  • gt_seg_map (optional)

Added Keys:

  • homography_matrix

Parameters
  • crop_size (tuple) – The relative ratio or absolute pixels of (width, height).

  • crop_type (str, optional) – One of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Defaults to “absolute”.

  • allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Defaults to False.

  • recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Defaults to False.

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

Note

  • If the image is smaller than the absolute crop size, return the

    original image.

  • The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.

  • If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

class mmdet.datasets.transforms.RandomErasing(n_patches: Union[int, Tuple[int, int]], ratio: Union[float, Tuple[float, float]], squared: bool = True, bbox_erased_thr: float = 0.9, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255)[source]

RandomErasing operation.

Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. RandomErasing.

Required Keys:

  • img

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (BitmapMasks) (optional)

Modified Keys: - img - gt_bboxes (optional) - gt_bboxes_labels (optional) - gt_ignore_flags (optional) - gt_masks (optional)

Parameters
  • n_patches (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a tuple, number of patches will be randomly selected from the closed interval [n_patches[0], n_patches[1]].

  • ratio (float or tuple[float, float]) – The ratio of erased regions. It can be float to use a fixed ratio or tuple[float, float] to randomly choose ratio from the interval.

  • squared (bool) – Whether to erase square region. Defaults to True.

  • bbox_erased_thr (float) – The threshold for the maximum area proportion of the bbox to be erased. When the proportion of the area where the bbox is erased is greater than the threshold, the bbox will be removed. Defaults to 0.9.

  • img_border_value (int or float or tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

class mmdet.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]

Flip the image & bbox & mask & segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, and gt_seg_map. There are 3 flip modes:

  • prob is float, direction is string: the image will be

    direction``ly flipped with probability of ``prob . E.g., prob=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

  • prob is float, direction is list of string: the image will

    be direction[i]``ly flipped with probability of ``prob/len(direction). E.g., prob=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.

  • prob is list of float, direction is list of string:

    given len(prob) == len(direction), the image will be direction[i]``ly flipped with probability of ``prob[i]. E.g., prob=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • flip

  • flip_direction

  • homography_matrix

Parameters
  • prob (float | list[float], optional) – The flipping probability. Defaults to None.

  • direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal prob. Each element in prob indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.

class mmdet.datasets.transforms.RandomOrder(transforms: Union[Dict, Callable[[Dict], Dict], Sequence[Union[Dict, Callable[[Dict], Dict]]]])[source]

Shuffle the transform Sequence.

transform(results: Dict)Optional[Dict][source]

Transform function to apply transforms in random order.

Parameters

results (dict) – A result dict contains the results to transform.

Returns

Transformed results.

Return type

dict or None

class mmdet.datasets.transforms.RandomShift(prob: float = 0.5, max_shift_px: int = 32, filter_thr_px: int = 1)[source]

Shift the image and box given shift pixels and probability.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_bboxes_labels

  • gt_ignore_flags (bool) (optional)

Parameters
  • prob (float) – Probability of shifts. Defaults to 0.5.

  • max_shift_px (int) – The max pixels for shifting. Defaults to 32.

  • filter_thr_px (int) – The width and height threshold for filtering. The bbox and the rest of the targets below the width and height threshold will be filtered. Defaults to 1.

class mmdet.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]

Resize images & bbox & seg.

This transform resizes the input image according to scale or scale_factor. Bboxes, masks, and seg map are then resized with the same scale factor. if scale and scale_factor are both set, it will use scale to resize.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

  • homography_matrix

Parameters
  • scale (int or tuple) – Images scales for resizing. Defaults to None

  • scale_factor (float or tuple[float]) – Scale factors for resizing. Defaults to None.

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Rotate(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Rotate the images, bboxes, masks and segmentation map.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The maximum angle for rotation. Defaults to 0.0.

  • max_mag (float) – The maximum angle for rotation. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the rotation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.SegRescale(scale_factor: float = 1, backend: str = 'cv2')[source]

Rescale semantic segmentation maps.

This transform rescale the gt_seg_map according to scale_factor.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

Parameters
  • scale_factor (float) – The scale factor of the final output. Defaults to 1.

  • backend (str) – Image rescale backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

transform(results: dict)dict[source]

Transform function to scale the semantic segmentation map.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with semantic segmentation map scaled.

Return type

dict

class mmdet.datasets.transforms.Sharpness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]

Adjust images sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image blurry. A magnitude=0 gives the origin img.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Sharpness transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Sharpness transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Sharpness transformation. Defaults to 1.9.

class mmdet.datasets.transforms.ShearX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Shear the images, bboxes, masks and segmentation map horizontally.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for performing Shear and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum angle for the horizontal shear. Defaults to 0.0.

  • max_mag (float) – The maximum angle for the horizontal shear. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the horizontal shear magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.ShearY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Shear the images, bboxes, masks and segmentation map vertically.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for performing ShearY and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum angle for the vertical shear. Defaults to 0.0.

  • max_mag (float) – The maximum angle for the vertical shear. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the vertical shear magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Solarize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 256.0)[source]

Solarize images (Invert all pixels above a threshold value of magnitude.).

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing Solarize transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Solarize transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for Solarize transformation. Defaults to 256.0.

class mmdet.datasets.transforms.SolarizeAdd(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 110.0)[source]

SolarizeAdd images. For each pixel in the image that is less than 128, add an additional amount to it decided by the magnitude.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • prob (float) – The probability for performing SolarizeAdd transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for SolarizeAdd transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for SolarizeAdd transformation. Defaults to 110.0.

class mmdet.datasets.transforms.ToTensor(keys)[source]

Convert some results to torch.Tensor by given keys.

Parameters

keys (Sequence[str]) – Keys that need to be converted to Tensor.

class mmdet.datasets.transforms.TranslateX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Translate the images, bboxes, masks and segmentation map horizontally.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum pixel’s offset ratio for horizontal translation. Defaults to 0.0.

  • max_mag (float) – The maximum pixel’s offset ratio for horizontal translation. Defaults to 0.1.

  • reversal_prob (float) – The probability that reverses the horizontal translation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.TranslateY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]

Translate the images, bboxes, masks and segmentation map vertically.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

Parameters
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum pixel’s offset ratio for vertical translation. Defaults to 0.0.

  • max_mag (float) – The maximum pixel’s offset ratio for vertical translation. Defaults to 0.1.

  • reversal_prob (float) – The probability that reverses the vertical translation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Transpose(keys, order)[source]

Transpose some results by given keys.

Parameters
  • keys (Sequence[str]) – Keys of results to be transposed.

  • order (Sequence[int]) – Order of transpose.

class mmdet.datasets.transforms.YOLOXHSVRandomAug(hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30)[source]

Apply HSV augmentation to image sequentially. It is referenced from https://github.com/Megvii- BaseDetection/YOLOX/blob/main/yolox/data/data_augment.py#L21.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • hue_delta (int) – delta of hue. Defaults to 5.

  • saturation_delta (int) – delta of saturation. Defaults to 30.

  • value_delta (int) – delat of value. Defaults to 30.

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

mmdet.engine

hooks

class mmdet.engine.hooks.CheckInvalidLossHook(interval: int = 50)[source]

Check invalid loss hook.

This hook will regularly check whether the loss is valid during training.

Parameters

interval (int) – Checking interval (every k iterations). Default: 50.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[source]

Regularly check whether the loss is valid every n iterations.

Parameters
  • runner (Runner) – The runner of the training process.

  • batch_idx (int) – The index of the current batch in the train loop.

  • data_batch (dict, Optional) – Data from dataloader. Defaults to None.

  • outputs (dict, Optional) – Outputs from model. Defaults to None.

class mmdet.engine.hooks.DetVisualizationHook(draw: bool = False, interval: int = 50, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, test_out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[source]

Detection Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

  1. If show is True, it means that only the prediction results are

    visualized without storing data, so vis_backends needs to be excluded.

  2. If test_out_dir is specified, it means that the prediction results

    need to be saved to test_out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.

  3. vis_backends takes effect if the user does not specify show

    and test_out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

Parameters
  • draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.

  • interval (int) – The interval of visualization. Defaults to 50.

  • score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • test_out_dir (str, optional) – directory where painted images will be saved in testing process.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[source]

Run after every testing iterations.

Parameters
  • runner (Runner) – The runner of the testing process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[DetDataSample]) – A batch of data samples that contain annotations and predictions.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[source]

Run after every self.interval validation iterations.

Parameters
  • runner (Runner) – The runner of the validation process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[DetDataSample]]) – A batch of data samples that contain annotations and predictions.

class mmdet.engine.hooks.MeanTeacherHook(momentum: float = 0.001, interval: int = 1, skip_buffer=True)[source]

Mean Teacher Hook.

Mean Teacher is an efficient semi-supervised learning method in Mean Teacher. This method requires two models with exactly the same structure, as the student model and the teacher model, respectively. The student model updates the parameters through gradient descent, and the teacher model updates the parameters through exponential moving average of the student model. Compared with the student model, the teacher model is smoother and accumulates more knowledge.

Parameters
  • momentum (float) –

    The momentum used for updating teacher’s parameter.

    Teacher’s parameter are updated with the formula:

    teacher = (1-momentum) * teacher + momentum * student.

    Defaults to 0.001.

  • interval (int) – Update teacher’s parameter every interval iteration. Defaults to 1.

  • skip_buffers (bool) – Whether to skip the model buffers, such as batchnorm running stats (running_mean, running_var), it does not perform the ema operation. Default to True.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[source]

Update teacher’s parameter every self.interval iterations.

before_train(runner: mmengine.runner.runner.Runner)None[source]

To check that teacher model and student model exist.

momentum_update(model: torch.nn.modules.module.Module, momentum: float)None[source]

Compute the moving average of the parameters using exponential moving average.

class mmdet.engine.hooks.MemoryProfilerHook(interval: int = 50)[source]

Memory profiler hook recording memory information including virtual memory, swap memory, and the memory of the current process.

Parameters

interval (int) – Checking interval (every k iterations). Default: 50.

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None)None[source]

Regularly record memory information.

Parameters
  • runner (Runner) – The runner of the testing process.

  • batch_idx (int) – The index of the current batch in the test loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (Sequence[DetDataSample], optional) – Outputs from model. Defaults to None.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[source]

Regularly record memory information.

Parameters
  • runner (Runner) – The runner of the training process.

  • batch_idx (int) – The index of the current batch in the train loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (dict, optional) – Outputs from model. Defaults to None.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None)None[source]

Regularly record memory information.

Parameters
  • runner (Runner) – The runner of the validation process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (Sequence[DetDataSample], optional) – Outputs from model. Defaults to None.

class mmdet.engine.hooks.NumClassCheckHook[source]

Check whether the num_classes in head matches the length of classes in dataset.metainfo.

before_train_epoch(runner: mmengine.runner.runner.Runner)None[source]

Check whether the training dataset is compatible with head.

Parameters

runner (Runner) – The runner of the training or evaluation process.

before_val_epoch(runner: mmengine.runner.runner.Runner)None[source]

Check whether the dataset in val epoch is compatible with head.

Parameters

runner (Runner) – The runner of the training or evaluation process.

class mmdet.engine.hooks.PipelineSwitchHook(switch_epoch, switch_pipeline)[source]

Switch data pipeline at switch_epoch.

Parameters
  • switch_epoch (int) – switch pipeline at this epoch.

  • switch_pipeline (list[dict]) – the pipeline to switch to.

before_train_epoch(runner)[source]

switch pipeline.

class mmdet.engine.hooks.SetEpochInfoHook[source]

Set runner’s epoch information to the model.

before_train_epoch(runner)[source]

All subclasses should override this method, if they need any operations before each training epoch.

Parameters

runner (Runner) – The runner of the training process.

class mmdet.engine.hooks.SyncNormHook[source]

Synchronize Norm states before validation, currently used in YOLOX.

before_val_epoch(runner)[source]

Synchronizing norm.

class mmdet.engine.hooks.YOLOXModeSwitchHook(num_last_epochs: int = 15, skip_type_keys: Sequence[str] = ('Mosaic', 'RandomAffine', 'MixUp'))[source]

Switch the mode of YOLOX during training.

This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.

Parameters

num_last_epochs – The number of latter epochs in the end of the training to close the data augmentation and switch to L1 loss. Defaults to 15.

before_train_epoch(runner)None[source]

Close mosaic and mixup augmentation and switches to use L1 loss.

optimizers

class mmdet.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]
add_params(params: List[dict], module: torch.nn.modules.module.Module, **kwargs)None[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

runner

class mmdet.engine.runner.TeacherStudentValLoop(runner, dataloader: Union[torch.utils.data.dataloader.DataLoader, Dict], evaluator: Union[mmengine.evaluator.evaluator.Evaluator, Dict, List], fp16: bool = False)[source]

Loop for validation of model teacher and student.

run()[source]

Launch validation for model teacher and student.

schedulers

class mmdet.engine.schedulers.QuadraticWarmupLR(optimizer, *args, **kwargs)[source]

Warm up the learning rate of each parameter group by quadratic formula.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

class mmdet.engine.schedulers.QuadraticWarmupMomentum(optimizer, *args, **kwargs)[source]

Warm up the momentum value of each parameter group by quadratic formula.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

class mmdet.engine.schedulers.QuadraticWarmupParamScheduler(optimizer: torch.optim.optimizer.Optimizer, param_name: str, begin: int = 0, end: int = 1000000000, last_step: int = - 1, by_epoch: bool = True, verbose: bool = False)[source]

Warm up the parameter value of each parameter group by quadratic formula:

\[X_{t} = X_{t-1} + \frac{2t+1}{{(end-begin)}^{2}} \times X_{base}\]
Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • param_name (str) – Name of the parameter to be adjusted, such as lr, momentum.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

classmethod build_iter_from_epoch(*args, begin=0, end=1000000000, by_epoch=True, epoch_length=None, **kwargs)[source]

Build an iter-based instance of this scheduler from an epoch-based config.

mmdet.evaluation

functional

mmdet.evaluation.functional.average_precision(recalls, precisions, mode='area')[source]

Calculate average precision (for single or multiple scales).

Parameters
  • recalls (ndarray) – shape (num_scales, num_dets) or (num_dets, )

  • precisions (ndarray) – shape (num_scales, num_dets) or (num_dets, )

  • mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]

Returns

calculated average precision

Return type

float or ndarray

mmdet.evaluation.functional.bbox_overlaps(bboxes1, bboxes2, mode='iou', eps=1e-06, use_legacy_coordinate=False)[source]

Calculate the ious between each bbox of bboxes1 and bboxes2.

Parameters
  • bboxes1 (ndarray) – Shape (n, 4)

  • bboxes2 (ndarray) – Shape (k, 4)

  • mode (str) – IOU (intersection over union) or IOF (intersection over foreground)

  • use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Note when function is used in VOCDataset, it should be True to align with the official implementation http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCdevkit_18-May-2011.tar Default: False.

Returns

Shape (n, k)

Return type

ious (ndarray)

mmdet.evaluation.functional.cityscapes_classes()list[source]

Class names of Cityscapes.

mmdet.evaluation.functional.coco_classes()list[source]

Class names of COCO.

mmdet.evaluation.functional.eval_map(det_results, annotations, scale_ranges=None, iou_thr=0.5, ioa_thr=None, dataset=None, logger=None, tpfp_fn=None, nproc=4, use_legacy_coordinate=False, use_group_of=False, eval_mode='area')[source]

Evaluate mAP of a dataset.

Parameters
  • det_results (list[list]) – [[cls1_det, cls2_det, …], …]. The outer list indicates images, and the inner list indicates per-class detected bboxes.

  • annotations (list[dict]) –

    Ground truth annotations where each item of the list indicates an image. Keys of annotations are:

    • bboxes: numpy array of shape (n, 4)

    • labels: numpy array of shape (n, )

    • bboxes_ignore (optional): numpy array of shape (k, 4)

    • labels_ignore (optional): numpy array of shape (k, )

  • scale_ranges (list[tuple] | None) – Range of scales to be evaluated, in the format [(min1, max1), (min2, max2), …]. A range of (32, 64) means the area range between (32**2, 64**2). Defaults to None.

  • iou_thr (float) – IoU threshold to be considered as matched. Defaults to 0.5.

  • ioa_thr (float | None) – IoA threshold to be considered as matched, which only used in OpenImages evaluation. Defaults to None.

  • dataset (list[str] | str | None) – Dataset name or dataset classes, there are minor differences in metrics for different datasets, e.g. “voc”, “imagenet_det”, etc. Defaults to None.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.

  • tpfp_fn (callable | None) – The function used to determine true/ false positives. If None, tpfp_default() is used as default unless dataset is ‘det’ or ‘vid’ (tpfp_imagenet() in this case). If it is given as a function, then this function is used to evaluate tp & fp. Default None.

  • nproc (int) – Processes used for computing TP and FP. Defaults to 4.

  • use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Defaults to False.

  • use_group_of (bool) – Whether to use group of when calculate TP and FP, which only used in OpenImages evaluation. Defaults to False.

  • eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1], PASCAL VOC2007 uses 11points as default evaluate mode, while others are ‘area’. Defaults to ‘area’.

Returns

(mAP, [dict, dict, …])

Return type

tuple

mmdet.evaluation.functional.eval_recalls(gts, proposals, proposal_nums=None, iou_thrs=0.5, logger=None, use_legacy_coordinate=False)[source]

Calculate recalls.

Parameters
  • gts (list[ndarray]) – a list of arrays of shape (n, 4)

  • proposals (list[ndarray]) – a list of arrays of shape (k, 4) or (k, 5)

  • proposal_nums (int | Sequence[int]) – Top N proposals to be evaluated.

  • iou_thrs (float | Sequence[float]) – IoU thresholds. Default: 0.5.

  • logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.

  • use_legacy_coordinate (bool) – Whether use coordinate system in mmdet v1.x. “1” was added to both height and width which means w, h should be computed as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’. Default: False.

Returns

recalls of different ious and proposal nums

Return type

ndarray

mmdet.evaluation.functional.get_classes(dataset)list[source]

Get class names of a dataset.

mmdet.evaluation.functional.imagenet_det_classes()list[source]

Class names of ImageNet Det.

mmdet.evaluation.functional.imagenet_vid_classes()list[source]

Class names of ImageNet VID.

mmdet.evaluation.functional.objects365v1_classes()list[source]

Class names of Objects365 V1.

mmdet.evaluation.functional.objects365v2_classes()list[source]

Class names of Objects365 V2.

mmdet.evaluation.functional.oid_challenge_classes()list[source]

Class names of Open Images Challenge.

mmdet.evaluation.functional.oid_v6_classes()list[source]

Class names of Open Images V6.

mmdet.evaluation.functional.plot_iou_recall(recalls, iou_thrs)[source]

Plot IoU-Recalls curve.

Parameters
  • recalls (ndarray or list) – shape (k,)

  • iou_thrs (ndarray or list) – same shape as recalls

mmdet.evaluation.functional.plot_num_recall(recalls, proposal_nums)[source]

Plot Proposal_num-Recalls curve.

Parameters
  • recalls (ndarray or list) – shape (k,)

  • proposal_nums (ndarray or list) – same shape as recalls

mmdet.evaluation.functional.pq_compute_multi_core(matched_annotations_list, gt_folder, pred_folder, categories, file_client=None, nproc=32)[source]

Evaluate the metrics of Panoptic Segmentation with multithreading.

Same as the function with the same name in panopticapi.

Parameters
  • matched_annotations_list (list) – The matched annotation list. Each element is a tuple of annotations of the same image with the format (gt_anns, pred_anns).

  • gt_folder (str) – The path of the ground truth images.

  • pred_folder (str) – The path of the prediction images.

  • categories (str) – The categories of the dataset.

  • file_client (object) – The file client of the dataset. If None, the backend will be set to disk.

  • nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When nproc exceeds the number of cpu cores, the number of cpu cores is used.

mmdet.evaluation.functional.pq_compute_single_core(proc_id, annotation_set, gt_folder, pred_folder, categories, file_client=None, print_log=False)[source]

The single core function to evaluate the metric of Panoptic Segmentation.

Same as the function with the same name in panopticapi. Only the function to load the images is changed to use the file client.

Parameters
  • proc_id (int) – The id of the mini process.

  • gt_folder (str) – The path of the ground truth images.

  • pred_folder (str) – The path of the prediction images.

  • categories (str) – The categories of the dataset.

  • file_client (object) – The file client of the dataset. If None, the backend will be set to disk.

  • print_log (bool) – Whether to print the log. Defaults to False.

mmdet.evaluation.functional.print_map_summary(mean_ap, results, dataset=None, scale_ranges=None, logger=None)[source]

Print mAP and results of each class.

A table will be printed to show the gts/dets/recall/AP of each class and the mAP.

Parameters
  • mean_ap (float) – Calculated from eval_map().

  • results (list[dict]) – Calculated from eval_map().

  • dataset (list[str] | str | None) – Dataset name or dataset classes.

  • scale_ranges (list[tuple] | None) – Range of scales to be evaluated.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.

mmdet.evaluation.functional.print_recall_summary(recalls, proposal_nums, iou_thrs, row_idxs=None, col_idxs=None, logger=None)[source]

Print recalls in a table.

Parameters
  • recalls (ndarray) – calculated from bbox_recalls

  • proposal_nums (ndarray or list) – top N proposals

  • iou_thrs (ndarray or list) – iou thresholds

  • row_idxs (ndarray) – which rows(proposal nums) to print

  • col_idxs (ndarray) – which cols(iou thresholds) to print

  • logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.

mmdet.evaluation.functional.voc_classes()list[source]

Class names of PASCAL VOC.

metrics

class mmdet.evaluation.metrics.CityScapesMetric(outfile_prefix: str, seg_prefix: Optional[str] = None, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

CityScapes metric for instance segmentation.

Parameters
  • outfile_prefix (str) – The prefix of txt and png files. The txt and png file will be save in a directory whose path is “outfile_prefix.results/”.

  • seg_prefix (str, optional) – Path to the directory which contains the cityscapes instance segmentation masks. It’s necessary when training and validation. It could be None when infer on test dataset. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • keep_results (bool) – Whether to keep the results. When format_only is True, keep_results must be True. Defaults to False.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results.

Return type

Dict[str, float]

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.CocoMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None, sort_categories: bool = False)[source]

COCO evaluation metric.

Evaluate AR, AP, and mAP for detection tasks including proposal/box detection and instance segmentation. Please refer to https://cocodataset.org/#detection-eval for more details.

Parameters
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).

  • iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.

  • metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

  • sort_categories (bool) – Whether sort categories in annotations. Only used for Objects365V1Dataset. Defaults to False.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

Dict[str, float]

fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None)numpy.ndarray[source]

Evaluate proposal recall with COCO’s fast_eval_recall.

Parameters
  • results (List[dict]) – Results of the dataset.

  • proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.

  • iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.

  • logger (MMLogger, optional) – Logger used for logging the recall summary.

Returns

Averaged recall results.

Return type

np.ndarray

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str)str[source]

Convert ground truth to coco format json file.

Parameters
  • gt_dicts (Sequence[dict]) – Ground truth of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

Returns

The filename of the json file.

Return type

str

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

results2json(results: Sequence[dict], outfile_prefix: str)dict[source]

Dump the detection results to a COCO style json file.

There are 3 types of results: proposals, bbox predictions, mask predictions, and they have different data types. This method will automatically recognize the type, and dump them to json files.

Parameters
  • results (Sequence[dict]) – Testing results of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.bbox.json”, “somepath/xxx.segm.json”, “somepath/xxx.proposal.json”.

Returns

Possible keys are “bbox”, “segm”, “proposal”, and values are corresponding filenames.

Return type

dict

xyxy2xywh(bbox: numpy.ndarray)list[source]

Convert xyxy style bounding boxes to xywh style for COCO evaluation.

Parameters

bbox (numpy.ndarray) – The bounding boxes, shape (4, ), in xyxy order.

Returns

The converted bounding boxes, in xywh order.

Return type

list[float]

class mmdet.evaluation.metrics.CocoPanopticMetric(ann_file: Optional[str] = None, seg_prefix: Optional[str] = None, classwise: bool = False, format_only: bool = False, outfile_prefix: Optional[str] = None, nproc: int = 32, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

COCO panoptic segmentation evaluation metric.

Evaluate PQ, SQ RQ for panoptic segmentation tasks. Please refer to https://cocodataset.org/#panoptic-eval for more details.

Parameters
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • seg_prefix (str, optional) – Path to the directory which contains the coco panoptic segmentation mask. It should be specified when evaluate. Defaults to None.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. It should be specified when format_only is True. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When nproc exceeds the number of cpu cores, the number of cpu cores is used.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) –

The processed results of each batch. There are two cases:

  • When outfile_prefix is not provided, the elements in results are pq_stats which can be summed directly to get PQ.

  • When outfile_prefix is provided, the elements in results are tuples like (gt, pred).

Returns

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results.

Return type

Dict[str, float]

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str)Tuple[str, str][source]

Convert ground truth to coco panoptic segmentation format json file.

Parameters
  • gt_dicts (Sequence[dict]) – Ground truth of the dataset.

  • outfile_prefix (str) – The filename prefix of the json file. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

Returns

The filename of the json file and the name of the directory which contains panoptic segmentation masks.

Return type

Tuple[str, str]

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

result2json(results: Sequence[dict], outfile_prefix: str)Tuple[str, str][source]

Dump the panoptic results to a COCO style json file and a directory.

Parameters
  • results (Sequence[dict]) – Testing results of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files and the directory.

Returns

The json file and the directory which contains panoptic segmentation masks. The filename of the json is

”somepath/xxx.panoptic.json” and name of the directory is “somepath/xxx.panoptic”.

Return type

Tuple[str, str]

class mmdet.evaluation.metrics.CrowdHumanMetric(ann_file: str, metric: Union[str, List[str]] = ['AP', 'MR', 'JI'], format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None, eval_mode: int = 0, iou_thres: float = 0.5, compare_matching_method: Optional[str] = None, mr_ref: str = 'CALTECH_-2', num_ji_process: int = 10)[source]

CrowdHuman evaluation metric.

Evaluate Average Precision (AP), Miss Rate (MR) and Jaccard Index (JI) for detection tasks.

Parameters
  • ann_file (str) – Path to the annotation file.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘AP’, ‘MR’ and ‘JI’. Defaults to ‘AP’.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

  • eval_mode (int) – Select the mode of evaluate. Valid mode include 0(just body box), 1(just head box) and 2(both of them). Defaults to 0.

  • iou_thres (float) – IoU threshold. Defaults to 0.5.

  • compare_matching_method (str, optional) – Matching method to compare the detection results with the ground_truth when compute ‘AP’ and ‘MR’.Valid method include VOC and None(CALTECH). Default to None.

  • mr_ref (str) – Different parameter selection to calculate MR. Valid ref include CALTECH_-2 and CALTECH_-4. Defaults to CALTECH_-2.

  • num_ji_process (int) – The number of processes to evaluation JI. Defaults to 10.

compare(samples)[source]

Match the detection results with the ground_truth.

Parameters

samples (dict[Image]) – The detection result packaged by Image.

Returns

Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

Return type

score_list(list[tuple[ndarray, int, str]])

compute_ji_matching(dt_boxes, gt_boxes)[source]

Match the annotation box for each detection box.

Parameters
  • dt_boxes (ndarray) – Detection boxes.

  • gt_boxes (ndarray) – Ground_truth boxes.

Returns

Match result.

Return type

matches_(list[tuple[int, int]])

compute_ji_with_ignore(result_queue, dt_result, score_thr)[source]

Compute JI with ignore.

Parameters
  • result_queue (Queue) – The Queue for save compute result when multi_process.

  • dt_result (dict[Image]) – Detection result packaged by Image.

  • score_thr (float) – The threshold of detection score.

Returns

compute result.

Return type

dict

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

eval_results(Dict[str, float])

static eval_ap(score_list, gt_num, img_num)[source]

Evaluate by average precision.

Parameters
  • score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

  • gt_num (int) – The number of gt boxes in the entire dataset.

  • img_num (int) –

Returns

result of average precision.

Return type

ap(float)

eval_ji(samples)[source]

Evaluate by JI using multi_process.

Parameters

samples (Dict[str, Image]) – The detection result packaged by Image.

Returns

result of jaccard index.

Return type

ji(float)

eval_mr(score_list, gt_num, img_num)[source]

Evaluate by Caltech-style log-average miss rate.

Parameters
  • score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

  • gt_num (int) – The number of gt boxes in the entire dataset.

  • img_num (int) – The number of image in the entire dataset.

Returns

result of miss rate.

Return type

mr(float)

static gather(results)[source]

Integrate test results.

get_ignores(dt_boxes, gt_boxes)[source]

Get the number of ignore bboxes.

load_eval_samples(result_file)[source]

Load data from annotations file and detection results.

Parameters

result_file (str) – The file path of the saved detection results.

Returns

The detection result packaged by Image

Return type

Dict[Image]

process(data_batch: Sequence[dict], data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

static results2json(results: Sequence[dict], outfile_prefix: str)str[source]

Dump the detection results to a json file.

class mmdet.evaluation.metrics.DumpProposals(output_dir: str = '', proposals_file: str = 'proposals.pkl', num_max_proposals: Optional[int] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

Dump proposals pseudo metric.

Parameters
  • output_dir (str) – The root directory for proposals_file. Defaults to ‘’.

  • proposals_file (str) – Proposals file path. Defaults to ‘proposals.pkl’.

  • num_max_proposals (int, optional) – Maximum number of proposals to dump. If not specified, all proposals will be dumped.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[source]

Dump the processed results.

Parameters

results (list) – The processed results of each batch.

Returns

An empty dict.

Return type

dict

process(data_batch: Sequence[dict], data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.LVISMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

LVIS evaluation metric.

Parameters
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).

  • iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.

  • metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

Dict[str, float]

fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None)numpy.ndarray[source]

Evaluate proposal recall with LVIS’s fast_eval_recall.

Parameters
  • results (List[dict]) – Results of the dataset.

  • proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.

  • iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.

  • logger (MMLogger, optional) – Logger used for logging the recall summary.

Returns

Averaged recall results.

Return type

np.ndarray

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.OpenImagesMetric(iou_thrs: Union[float, List[float]] = 0.5, ioa_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, use_group_of: bool = True, get_supercategory: bool = True, filter_labels: bool = True, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

OpenImages evaluation metric.

Evaluate detection mAP for OpenImages. Please refer to https://storage.googleapis.com/openimages/web/evaluation.html for more details.

Parameters
  • iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.

  • ioa_thrs (float or List[float]) – IoA threshold. Defaults to 0.5.

  • scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None

  • use_group_of (bool) – Whether consider group of groud truth bboxes during evaluating. Defaults to True.

  • get_supercategory (bool) – Whether to get parent class of the current class. Default: True.

  • filter_labels (bool) – Whether filter unannotated classes. Default: True.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

dict

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.VOCMetric(iou_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, metric: Union[str, List[str]] = 'mAP', proposal_nums: Sequence[int] = (100, 300, 1000), eval_mode: str = '11points', collect_device: str = 'cpu', prefix: Optional[str] = None)[source]

Pascal VOC evaluation metric.

Parameters
  • iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.

  • scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None.

  • metric (str | list[str]) –

    Metrics to be evaluated. Options are ‘mAP’, ‘recall’. If is list, the first setting in the list will

    be used to evaluate metric.

  • proposal_nums (Sequence[int]) – Proposal number used for evaluating recalls, such as recall@100, recall@1000. Default: (100, 300, 1000).

  • eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]. The PASCAL VOC2007 defaults to use ‘11points’, while PASCAL VOC2012 defaults to use ‘area’.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

Return type

dict

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

mmdet.models

backbones

class mmdet.models.backbones.CSPDarknet(arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), frozen_stages=- 1, use_depthwise=False, arch_ovewrite=None, spp_kernal_sizes=(5, 9, 13), conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, init_cfg={'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]

CSP-Darknet backbone used in YOLOv5 and YOLOX.

Parameters
  • arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Default: P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Default: 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Default: False.

  • arch_ovewrite (list) – Overwrite default arch settings. Default: None.

  • spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Default: (5, 9, 13).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmdet.models import CSPDarknet
>>> import torch
>>> self = CSPDarknet(depth=53)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmdet.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, spp_kernel_sizes: Sequence[int] = (5, 9, 13), channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]

CSPNeXt backbone used in RTMDet.

Parameters
  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • spp_kernel_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).

  • channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

forward(x: Tuple[torch.Tensor, ...])Tuple[torch.Tensor, ...][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)None[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmdet.models.backbones.Darknet(depth=53, out_indices=(3, 4, 5), frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'}, norm_eval=True, pretrained=None, init_cfg=None)[source]

Darknet backbone.

Parameters
  • depth (int) – Depth of Darknet. Currently only support 53.

  • out_indices (Sequence[int]) – Output from which stages.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import Darknet
>>> import torch
>>> self = Darknet(depth=53)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static make_conv_res_block(in_channels, out_channels, res_repeat, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'})[source]

In Darknet backbone, ConvLayer is usually followed by ResBlock. This function will make that. The Conv layers always have 3x3 filters with stride=2. The number of the filters in Conv layer is the same as the out channels of the ResBlock.

Parameters
  • in_channels (int) – The number of input channels.

  • out_channels (int) – The number of output channels.

  • res_repeat (int) – The number of ResBlocks.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmdet.models.backbones.DetectoRS_ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone for DetectoRS.

Parameters
  • groups (int) – The number of groups in ResNeXt.

  • base_width (int) – The base width of ResNeXt.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer for DetectoRS.

class mmdet.models.backbones.DetectoRS_ResNet(sac=None, stage_with_sac=(False, False, False, False), rfp_inplanes=None, output_img=False, pretrained=None, init_cfg=None, **kwargs)[source]

ResNet backbone for DetectoRS.

Parameters
  • sac (dict, optional) – Dictionary to construct SAC (Switchable Atrous Convolution). Default: None.

  • stage_with_sac (list) – Which stage to use sac. Default: (False, False, False, False).

  • rfp_inplanes (int, optional) – The number of channels from RFP. Default: None. If specified, an additional conv layer will be added for rfp_feat. Otherwise, the structure is the same as base class.

  • output_img (bool) – If True, the input image will be inserted into the starting position of output. Default: False.

forward(x)[source]

Forward function.

init_weights()[source]

Initialize the weights.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer for DetectoRS.

rfp_forward(x, rfp_feats)[source]

Forward function for RFP.

class mmdet.models.backbones.EfficientNet(arch='b0', drop_path_rate=0.0, out_indices=(6), frozen_stages=0, conv_cfg={'type': 'Conv2dAdaptivePadding'}, norm_cfg={'eps': 0.001, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': 'Conv2d'}, {'type': 'Constant', 'layer': ['_BatchNorm', 'GroupNorm'], 'val': 1}])[source]

EfficientNet backbone.

Parameters
  • arch (str) – Architecture of efficientnet. Defaults to b0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (6, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to 0, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmdet.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions arXiv:.

Parameters
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules(int): The number of HRModule in this stage.

    • num_branches(int): The number of branches in the HRModule.

    • block(str): The type of convolution block.

    • num_blocks(tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels(tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmdet.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmdet.models.backbones.HourglassNet(downsample_times: int = 5, num_stacks: int = 2, stage_channels: Sequence = (256, 256, 384, 384, 384, 512), stage_blocks: Sequence = (2, 2, 2, 2, 2, 4), feat_channel: int = 256, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'requires_grad': True, 'type': 'BN'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

Parameters
  • downsample_times (int) – Downsample times in a HourglassModule.

  • num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.

  • stage_channels (Sequence[int]) – Feature channel of each sub-module in a HourglassModule.

  • stage_blocks (Sequence[int]) – Number of sub-modules stacked in a HourglassModule.

  • feat_channel (int) – Feature channel of conv after a HourglassModule.

  • norm_cfg – Dictionary to construct and config norm layer.

Example

>>> from mmdet.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)
forward(x: torch.Tensor)List[torch.Tensor][source]

Forward function.

init_weights()None[source]

Init module weights.

class mmdet.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(1, 2, 4, 7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

MobileNetV2 backbone.

Parameters
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int], optional) – Output from which stages. Default: (1, 2, 4, 7).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Forward function.

make_layer(out_channels, num_blocks, stride, expand_ratio)[source]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

Parameters
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[source]

Convert the model into training mode while keep normalization layer frozen.

class mmdet.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, convert_weights=True, init_cfg=None)[source]

Pyramid Vision Transformer (PVT)

Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

Parameters
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 64.

  • num_stags (int) – The num of stages. Default: 4.

  • num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].

  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].

  • strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].

  • paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.

  • use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmdet.models.backbones.PyramidVisionTransformerV2(**kwargs)[source]

Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.

class mmdet.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

RegNet backbone.

More details can be found in paper .

Parameters
  • arch (dict) –

    The parameter of RegNets.

    • w0 (int): initial width

    • wa (float): slope of width

    • wm (float): quantization parameter to quantize the width

    • depth (int): depth of the backbone

    • group_w (int): width of group

    • bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Default: 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
adjust_width_group(widths, bottleneck_ratio, groups)[source]

Adjusts the compatibility of widths and groups.

Parameters
  • widths (list[int]) – Width of each stage.

  • bottleneck_ratio (float) – Bottleneck ratio.

  • groups (int) – number of groups in each stage

Returns

The adjusted widths and groups of each stage.

Return type

tuple(list)

forward(x)[source]

Forward function.

generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]

Generates per block width from RegNet parameters.

Parameters
  • initial_width ([int]) – Initial width of the backbone

  • width_slope ([float]) – Slope of the quantized linear function

  • width_parameter ([int]) – Parameter used to quantize the width.

  • depth ([int]) – Depth of the backbone.

  • divisor (int, optional) – The divisor of channels. Defaults to 8.

Returns

return a list of widths of each stage and the number of stages

Return type

list, int

get_stages_from_blocks(widths)[source]

Gets widths/stage_blocks of network at each stage.

Parameters

widths (list[int]) – Width in each stage.

Returns

width and depth of each stage

Return type

tuple(list)

static quantize_float(number, divisor)[source]

Converts a float to closest non-zero int divisible by divisor.

Parameters
  • number (int) – Original number to be quantized.

  • divisor (int) – Divisor used to quantize the number.

Returns

quantized number that is divisible by devisor.

Return type

int

class mmdet.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, pretrained=None, init_cfg=None, **kwargs)[source]

Res2Net backbone.

Parameters
  • scales (int) – Scales used in Res2Net. Default: 4

  • base_width (int) – Basic width of each scale. Default: 26

  • depth (int) – Depth of res2net, from {50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Res2net stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import Res2Net
>>> import torch
>>> self = Res2Net(depth=50, scales=4, base_width=26)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

class mmdet.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]

ResNeSt backbone.

Parameters
  • groups (int) – Number of groups of Bottleneck. Default: 1

  • base_width (int) – Base width of Bottleneck. Default: 4

  • radix (int) – Radix of SplitAttentionConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • kwargs (dict) – Keyword arguments for ResNet.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

class mmdet.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer

class mmdet.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

ResNet backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Example

>>> from mmdet.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]

Make plugins for ResNet stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True),
...          position='after_conv2'),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='1'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='2'),
...          stages=(True, True, True, True),
...          position='after_conv3')
... ]
>>> self = ResNet(depth=18)
>>> stage_plugins = self.make_stage_plugins(plugins, 0)
>>> assert len(stage_plugins) == 3

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmdet.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmdet.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]

VGG Backbone network for single-shot-detection.

Parameters
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_last_pool (bool) – Whether to add a pooling layer at the last of the model

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • out_indices (Sequence[int]) – Output from which stages.

  • out_feature_indices (Sequence[int]) – Output from which feature map.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

  • input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.

  • l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.

Example

>>> self = SSDVGG(input_size=300, depth=11)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 300, 300)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 19, 19)
(1, 512, 10, 10)
(1, 256, 5, 5)
(1, 256, 3, 3)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

init_weights(pretrained=None)[source]

Initialize the weights.

class mmdet.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, convert_weights=False, frozen_stages=- 1, init_cfg=None)[source]

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -

Inspiration from https://github.com/microsoft/Swin-Transformer

Parameters
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Convert the model into training mode while keep layers freezed.

class mmdet.models.backbones.TridentResNet(depth, num_branch, test_branch_idx, trident_dilations, **kwargs)[source]

The stem layer, stage 1 and stage 2 in Trident ResNet are identical to ResNet, while in stage 3, Trident BottleBlock is utilized to replace the normal BottleBlock to yield trident output. Different branch shares the convolution weight but uses different dilations to achieve multi-scale output.

/ stage3(b0) x - stem - stage1 - stage2 - stage3(b1) - output stage3(b2) /

Parameters
  • depth (int) – Depth of resnet, from {50, 101, 152}.

  • num_branch (int) – Number of branches in TridentNet.

  • test_branch_idx (int) – In inference, all 3 branches will be used if test_branch_idx==-1, otherwise only branch with index test_branch_idx will be used.

  • trident_dilations (tuple[int]) – Dilations of different trident branch. len(trident_dilations) should be equal to num_branch.

data_preprocessors

class mmdet.models.data_preprocessors.BatchFixedSizePad(size: Tuple[int, int], img_pad_value: int = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255)[source]

Fixed size padding for batch images.

Parameters
  • size (Tuple[int, int]) – Fixed padding size. Expected padding shape (h, w). Defaults to None.

  • img_pad_value (int) – The padded pixel value for images. Defaults to 0.

  • pad_mask (bool) – Whether to pad instance masks. Defaults to False.

  • mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.

  • pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.

  • seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.

forward(inputs: torch.Tensor, data_samples: Optional[List[dict]] = None)Tuple[torch.Tensor, Optional[List[dict]]][source]

Pad image, instance masks, segmantic segmentation maps.

class mmdet.models.data_preprocessors.BatchResize(scale: tuple, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0)[source]

Batch resize during training. This implementation is modified from https://github.com/Purkialo/CrowdDet/blob/master/lib/data/CrowdHuman.py.

It provides the data pre-processing as follows: - A batch of all images will pad to a uniform size and stack them into

a torch.Tensor by DetDataPreprocessor.

  • BatchFixShapeResize resize all images to the target size.

  • Padding images to make sure the size of image can be divisible by pad_size_divisor.

Parameters
  • scale (tuple) – Images scales for resizing.

  • pad_size_divisor (int) – Image size divisible factor. Defaults to 1.

  • pad_value (Number) – The padded pixel value. Defaults to 0.

forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample])Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][source]

resize a batch of images and bboxes.

get_padded_tensor(tensor: torch.Tensor, pad_value: int)torch.Tensor[source]

Pad images according to pad_size_divisor.

get_target_size(height: int, width: int)Tuple[int, int, float][source]

Get the target size of a batch of images based on data and scale.

class mmdet.models.data_preprocessors.BatchSyncRandomResize(random_size_range: Tuple[int, int], interval: int = 10, size_divisor: int = 32)[source]

Batch random resize which synchronizes the random size across ranks.

Parameters
  • random_size_range (tuple) – The multi-scale random range during multi-scale training.

  • interval (int) – The iter interval of change image size. Defaults to 10.

  • size_divisor (int) – Image size divisible factor. Defaults to 32.

forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample])Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][source]

resize a batch of images and bboxes to shape self._input_size

class mmdet.models.data_preprocessors.BoxInstDataPreprocessor(*arg, mask_stride: int = 4, pairwise_size: int = 3, pairwise_dilation: int = 2, pairwise_color_thresh: float = 0.3, bottom_pixels_removed: int = 10, **kwargs)[source]

Pseudo mask pre-processor for BoxInst.

Comparing with the mmdet.DetDataPreprocessor,

  1. It generates masks using box annotations.

  2. It computes the images color similarity in LAB color space.

Parameters
  • mask_stride (int) – The mask output stride in boxinst. Defaults to 4.

  • pairwise_size (int) – The size of neighborhood for each pixel. Defaults to 3.

  • pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.

  • pairwise_color_thresh (float) – The thresh of image color similarity. Defaults to 0.3.

  • bottom_pixels_removed (int) – The length of removed pixels in bottom. It is caused by the annotation error in coco dataset. Defaults to 10.

forward(data: dict, training: bool = False)dict[source]

Get pseudo mask labels using color similarity.

get_images_color_similarity(inputs: torch.Tensor, image_masks: torch.Tensor)torch.Tensor[source]

Compute the image color similarity in LAB color space.

class mmdet.models.data_preprocessors.DetDataPreprocessor(mean: Optional[Sequence[numbers.Number]] = None, std: Optional[Sequence[numbers.Number]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, boxtype2tensor: bool = True, batch_augments: Optional[List[dict]] = None)[source]

Image pre-processor for detection tasks.

Comparing with the mmengine.ImgDataPreprocessor,

  1. It supports batch augmentations.

2. It will additionally append batch_input_shape and pad_shape to data_samples considering the object detection task.

It provides the data pre-processing as follows

  • Collate and move data to the target device.

  • Pad inputs to the maximum size of current batch with defined pad_value. The padding size can be divisible by a defined pad_size_divisor

  • Stack inputs to batch_inputs.

  • Convert inputs from bgr to rgb if the shape of input is (3, H, W).

  • Normalize image with defined std and mean.

  • Do batch augmentations during training.

Parameters
  • mean (Sequence[Number], optional) – The pixel mean of R, G, B channels. Defaults to None.

  • std (Sequence[Number], optional) – The pixel standard deviation of R, G, B channels. Defaults to None.

  • pad_size_divisor (int) – The size of padded image should be divisible by pad_size_divisor. Defaults to 1.

  • pad_value (Number) – The padded pixel value. Defaults to 0.

  • pad_mask (bool) – Whether to pad instance masks. Defaults to False.

  • mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.

  • pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.

  • seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.

  • bgr_to_rgb (bool) – whether to convert image from BGR to RGB. Defaults to False.

  • rgb_to_bgr (bool) – whether to convert image from RGB to RGB. Defaults to False.

  • boxtype2tensor (bool) – Whether to keep the BaseBoxes type of bboxes data or not. Defaults to False.

  • batch_augments (list[dict], optional) – Batch-level augmentations

forward(data: dict, training: bool = False)dict[source]

Perform normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

Parameters
  • data (dict) – Data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation.

Returns

Data in the same format as the model input.

Return type

dict

pad_gt_masks(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[source]

Pad gt_masks to shape of batch_input_shape.

pad_gt_sem_seg(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[source]

Pad gt_sem_seg to shape of batch_input_shape.

class mmdet.models.data_preprocessors.MultiBranchDataPreprocessor(data_preprocessor: Union[mmengine.config.config.ConfigDict, dict])[source]

DataPreprocessor wrapper for multi-branch data.

Take semi-supervised object detection as an example, assume that the ratio of labeled data and unlabeled data in a batch is 1:2, sup indicates the branch where the labeled data is augmented, unsup_teacher and unsup_student indicate the branches where the unlabeled data is augmented by different pipeline.

The input format of multi-branch data is shown as below :

The format of multi-branch data after filtering None is shown as below :

In order to reuse DetDataPreprocessor for the data from different branches, the format of multi-branch data grouped by branch is as below :

After preprocessing data from different branches, the multi-branch data needs to be reformatted as:

Parameters

data_preprocessor (ConfigDict or dict) – Config of DetDataPreprocessor to process the input data.

cpu(*args, **kwargs)torch.nn.modules.module.Module[source]

Overrides this method to set the device

Returns

The model itself.

Return type

nn.Module

cuda(*args, **kwargs)torch.nn.modules.module.Module[source]

Overrides this method to set the device

Returns

The model itself.

Return type

nn.Module

forward(data: dict, training: bool = False)dict[source]

Perform normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor for multi-branch data.

Parameters
  • data (dict) – Data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation.

Returns

  • ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of

    models from different branches.

  • ’data_sample’ (Dict[str, obj:DetDataSample]): The annotation

    info of the sample from different branches.

Return type

dict

to(device: Optional[Union[int, torch.device]], *args, **kwargs)torch.nn.modules.module.Module[source]

Overrides this method to set the device

Parameters

device (int or torch.device, optional) – The desired device of the parameters and buffers in this module.

Returns

The model itself.

Return type

nn.Module

dense_heads

class mmdet.models.dense_heads.ATSSHead(num_classes: int, in_channels: int, pred_kernel_size: int = 3, stacked_convs: int = 4, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, reg_decoded_bbox: bool = True, loss_centerness: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'atss_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]

Detection Head of ATSS.

ATSS head structure is similar with FCOS, however ATSS use anchor boxes and assign label by Adaptive Training Sample Selection instead max-iou.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d

  • stacked_convs (int) – Number of stacking convs of the head.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='GN', num_groups=32, requires_grad=True).

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • loss_centerness (ConfigDict or dict) – Config of centerness loss. Defaults to dict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

centerness_target(anchors: torch.Tensor, gts: torch.Tensor)torch.Tensor[source]

Calculate the centerness between anchors and gts.

Only calculate pos centerness targets, otherwise there may be nan.

Parameters
  • anchors (Tensor) – Anchors with shape (N, 4), “xyxy” format.

  • gts (Tensor) – Ground truth bboxes with shape (N, 4), “xyxy” format.

Returns

Centerness between anchors and gts.

Return type

Tensor

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor]][source]

Forward features from the upstream network.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction
cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_anchors * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for all scale

levels, each is a 4D-tensor, the channels number is num_anchors * 4.

Return type

tuple

forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale)Sequence[torch.Tensor][source]

Forward feature of a single scale level.

Parameters
  • x (Tensor) – Features of a single scale level.

  • ( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

Returns

cls_score (Tensor): Cls scores for a single scale level

the channels number is num_anchors * num_classes.

bbox_pred (Tensor): Box energies / deltas for a single scale

level, the channels number is num_anchors * 4.

centerness (Tensor): Centerness for a single scale level, the

channel number is (N, num_anchors * 1, H, W).

Return type

tuple

get_num_level_anchors_inside(num_level_anchors, inside_flags)[source]

Get the number of valid anchors in every level.

get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True)tuple[source]

Get targets for ATSS head.

This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], centernesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)

  • centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

loss_by_feat_single(anchors: torch.Tensor, cls_score: torch.Tensor, bbox_pred: torch.Tensor, centerness: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, avg_factor: float)dict[source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmdet.models.dense_heads.AnchorFreeHead(num_classes: int, in_channels: int, feat_channels: int = 256, stacked_convs: int = 4, strides: Union[Sequence[int], Sequence[Tuple[int, int]]] = (4, 8, 16, 32, 64), dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'IoULoss'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'conv_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'})[source]

Anchor-free head (FCOS, Fovea, RepPoints, etc.).

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • feat_channels (int) – Number of hidden channels. Used in child classes.

  • stacked_convs (int) – Number of stacking convs of the head.

  • strides (Sequence[int] or Sequence[Tuple[int, int]]) – Downsample factor of each feature map.

  • dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.

  • conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Default: “auto”.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder. Defaults ‘DistancePointBBoxCoder’.

  • conv_cfg (ConfigDict or dict, Optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict, Optional) – Config dict for normalization layer. Defaults to None.

  • train_cfg (ConfigDict or dict, Optional) – Training config of anchor-free head.

  • test_cfg (ConfigDict or dict, Optional) – Testing config of anchor-free head.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict]) – Initialization config dict.

aug_test(aug_batch_feats: List[torch.Tensor], aug_batch_img_metas: List[List[torch.Tensor]], rescale: bool = False)List[numpy.ndarray][source]

Test function with test time augmentation.

Parameters
  • aug_batch_feats (list[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains features for all images in the batch.

  • aug_batch_img_metas (list[list[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. each dict has image information.

  • rescale (bool, optional) – Whether to rescale the results. Defaults to False.

Returns

bbox results of each class

Return type

list[ndarray]

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor], List[torch.Tensor]][source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually contain classification scores and bbox predictions.

  • cls_scores (list[Tensor]): Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

Return type

tuple

forward_single(x: torch.Tensor)Tuple[torch.Tensor, ...][source]

Forward features of a single scale level.

Parameters

x (Tensor) – FPN feature maps of the specified stride.

Returns

Scores for each class, bbox predictions, features after classification and regression conv layers, some models needs these features like FCOS.

Return type

tuple

abstract get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Any[source]

Compute regression, classification and centerness targets for points in multiple images.

Parameters
  • points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

abstract loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

class mmdet.models.dense_heads.AnchorHead(num_classes: int, in_channels: int, feat_channels: int = 256, anchor_generator: Union[mmengine.config.config.ConfigDict, dict] = {'ratios': [0.5, 1.0, 2.0], 'scales': [8, 16, 32], 'strides': [4, 8, 16, 32, 64], 'type': 'AnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'clip_border': True, 'target_means': (0.0, 0.0, 0.0, 0.0), 'target_stds': (1.0, 1.0, 1.0, 1.0), 'type': 'DeltaXYWHBBoxCoder'}, reg_decoded_bbox: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'})[source]

Anchor-based head (RPN, RetinaNet, SSD, etc.).

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • feat_channels (int) – Number of hidden channels. Used in child classes.

  • anchor_generator (dict) – Config dict for anchor generator

  • bbox_coder (dict) – Config of bounding box coder.

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Default False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • train_cfg (dict) – Training config of anchor head.

  • test_cfg (dict) – Testing config of anchor head.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor]][source]

Forward features from the upstream network.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of classification scores and bbox prediction.

  • cls_scores (list[Tensor]): Classification scores for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

Return type

tuple

forward_single(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][source]

Forward feature of a single scale level.

Parameters

x (Tensor) – Features of a single scale level.

Returns

cls_score (Tensor): Cls scores for a single scale level the channels number is num_base_priors * num_classes. bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_base_priors * 4.

Return type

tuple

get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: Union[torch.device, str] = 'cuda')Tuple[List[List[torch.Tensor]], List[List[torch.Tensor]]][source]

Get anchors according to feature map sizes.

Parameters
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • batch_img_metas (list[dict]) – Image meta info.

  • device (torch.device | str) – Device for returned tensors. Defaults to cuda.

Returns

  • anchor_list (list[list[Tensor]]): Anchors of each image.

  • valid_flag_list (list[list[Tensor]]): Valid flags of each image.

Return type

tuple

get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True, return_sampling_results: bool = False)tuple[source]

Compute regression and classification targets for anchors in multiple images.

Parameters
  • anchor_list (list[list[Tensor]]) – Multi level anchors of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, 4).

  • valid_flag_list (list[list[Tensor]]) – Multi level valid flags of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, )

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.

  • return_sampling_results (bool) – Whether to return the sampling results. Defaults to False.

Returns

Usually returns a tuple containing learning targets.

  • labels_list (list[Tensor]): Labels of each level.

  • label_weights_list (list[Tensor]): Label weights of each level.

  • bbox_targets_list (list[Tensor]): BBox targets of each level.

  • bbox_weights_list (list[Tensor]): BBox weights of each level.

  • avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

additional_returns: This function enables user-defined returns from

self._get_targets_single. These returns are currently refined to properties at each feature map (i.e. having HxW dimension). The results will be concatenated after the end

Return type

tuple

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat_single(cls_score: torch.Tensor, bbox_pred: torch.Tensor, anchors: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, bbox_weights: torch.Tensor, avg_factor: int)tuple[source]

Calculate the loss of a single scale level based on the features extracted by the detection head.

Parameters
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).

  • avg_factor (int) – Average factor that is used to average the loss.

Returns

loss components.

Return type

tuple

class mmdet.models.dense_heads.AutoAssignHead(*args, force_topk: bool = False, topk: int = 9, pos_loss_weight: float = 0.25, neg_loss_weight: float = 0.75, center_loss_weight: float = 0.75, **kwargs)[source]

AutoAssignHead head used in AutoAssign.

More details can be found in the paper .

Parameters
  • force_topk (bool) – Used in center prior initialization to handle extremely small gt. Default is False.

  • topk (int) – The number of points used to calculate the center prior when no point falls in gt_bbox. Only work when force_topk if True. Defaults to 9.

  • pos_loss_weight (float) – The loss weight of positive loss and with default value 0.25.

  • neg_loss_weight (float) – The loss weight of negative loss and with default value 0.75.

  • center_loss_weight (float) – The loss weight of center prior loss and with default value 0.75.

forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale, stride: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward features of a single scale level.

Parameters
  • x (Tensor) – FPN feature maps of the specified stride.

  • scale (mmcv.cnn.Scale) – Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.

Returns

scores for each class, bbox predictions and centerness predictions of input feature maps.

Return type

tuple[Tensor, Tensor, Tensor]

get_neg_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, ious: torch.Tensor, inside_gt_bbox_mask: torch.Tensor)Tuple[torch.Tensor][source]

Calculate the negative loss of all points in feature map.

Parameters
  • cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).

  • objectness (Tensor) – Foreground probability of all points and is shape of (num_points, 1).

  • gt_instances (InstanceData) – Ground truth of instance annotations. It should includes bboxes and labels attributes.

  • ious (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represent the iou of pred_bbox and gt_bboxes.

  • inside_gt_bbox_mask (Tensor) – Tensor of bool type, with shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.

Returns

  • neg_loss (Tensor): The negative loss of all points in the feature map.

Return type

tuple[Tensor]

get_pos_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, reg_loss: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, center_prior_weights: torch.Tensor)Tuple[torch.Tensor][source]

Calculate the positive loss of all points in gt_bboxes.

Parameters
  • cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).

  • objectness (Tensor) – Foreground probability of all points, has shape (num_points, 1).

  • reg_loss (Tensor) – The regression loss of each gt_bbox and each prediction box, has shape of (num_points, num_gt).

  • gt_instances (InstanceData) – Ground truth of instance annotations. It should includes bboxes and labels attributes.

  • center_prior_weights (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represents the center weighting coefficient.

Returns

  • pos_loss (Tensor): The positive loss of all points in the gt_bboxes.

Return type

tuple[Tensor]

get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Tuple[List[torch.Tensor], List[torch.Tensor]][source]

Compute regression targets and each point inside or outside gt_bbox in multiple images.

Parameters
  • points (list[Tensor]) – Points of all fpn level, each has shape (num_points, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

Returns

  • inside_gt_bbox_mask_list (list[Tensor]): Each Tensor is with bool type and shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.

  • concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. Each tensor has shape (num_points, num_gt, 4).

Return type

tuple(list[Tensor], list[Tensor])

init_weights()None[source]

Initialize weights of the head.

In particular, we have special initialization for classified conv’s and regression conv’s bias

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)Dict[str, torch.Tensor][source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • objectnesses (list[Tensor]) – objectness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmdet.models.dense_heads.BoxInstBboxHead(*args, **kwargs)[source]

BoxInst box head used in https://arxiv.org/abs/2012.02310.

class mmdet.models.dense_heads.BoxInstMaskHead(*arg, pairwise_size: int = 3, pairwise_dilation: int = 2, warmup_iters: int = 10000, **kwargs)[source]

BoxInst mask head used in https://arxiv.org/abs/2012.02310.

This head outputs the mask for BoxInst.

Parameters
  • pairwise_size (dict) – The size of neighborhood for each pixel. Defaults to 3.

  • pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.

  • warmup_iters (int) – Warmup iterations for pair-wise loss. Defaults to 10000.

get_pairwise_affinity(mask_logits: torch.Tensor)torch.Tensor[source]

Compute the pairwise affinity for each pixel.

loss_by_feat(mask_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], positive_infos: List[mmengine.structures.instance_data.InstanceData], **kwargs)dict[source]

Calculate the loss based on the features extracted by the mask head.

Parameters
  • mask_preds (list[Tensor]) – List of predicted masks, each has shape (num_classes, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes, masks, and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of multiple images.

  • positive_infos (List[:obj:InstanceData]) – Information of positive samples of each image that are assigned in detection head.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

class mmdet.models.dense_heads.CascadeRPNHead(num_classes: int, num_stages: int, stages: List[Union[dict, mmengine.config.config.ConfigDict]], train_cfg: List[Union[dict, mmengine.config.config.ConfigDict]], test_cfg: Union[mmengine.config.config.ConfigDict, dict], init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

The CascadeRPNHead will predict more accurate region proposals, which is required for two-stage detectors (such as Fast/Faster R-CNN). CascadeRPN consists of a sequence of RPNStage to progressively improve the accuracy of the detected proposals.

More details can be found in https://arxiv.org/abs/1909.06720.

Parameters
  • num_stages (int) – number of CascadeRPN stages.

  • stages (list[ConfigDict or dict]) – list of configs to build the stages.

  • train_cfg (list[ConfigDict or dict]) – list of configs at training time each stage.

  • test_cfg (ConfigDict or dict) – config at testing time.

  • init_cfg (ConfigDict or list[ConfigDict] or dict or list[dict]) – Initialization config dict.

loss(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample])dict[source]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_and_predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], proposal_cfg: Optional[mmengine.config.config.ConfigDict] = None)Tuple[dict, List[mmengine.structures.instance_data.InstanceData]][source]

Perform forward propagation of the head, then calculate loss and predictions from the features and data samples.

Parameters
  • x (tuple[Tensor]) – Features from FPN.

  • batch_data_samples (list[DetDataSample]) – Each item contains the meta information of each image and corresponding annotations.

  • proposal_cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

Returns

the return value is a tuple contains:

  • losses: (dict[str, Tensor]): A dictionary of loss components.

  • predictions (list[InstanceData]): Detection results of each image after the post process.

Return type

tuple

loss_by_feat()[source]

loss_by_feat() is implemented in StageCascadeRPNHead.

predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], rescale: bool = False)List[mmengine.structures.instance_data.InstanceData][source]

Perform forward propagation of the detection head and predict detection results on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

  • rescale (bool, optional) – Whether to rescale the results. Defaults to False.

Returns

InstanceData]: Detection results of each image after the post process.

Return type

list[obj

predict_by_feat()[source]

predict_by_feat() is implemented in StageCascadeRPNHead.

class mmdet.models.dense_heads.CenterNetHead(in_channels: int, feat_channels: int, num_classes: int, loss_center_heatmap: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'GaussianFocalLoss'}, loss_wh: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.1, 'type': 'L1Loss'}, loss_offset: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Objects as Points Head. CenterHead use center_point to indicate object’s position. Paper link <https://arxiv.org/abs/1904.07850>

Parameters
  • in_channels (int) – Number of channel in the input feature map.

  • feat_channels (int) – Number of channel in the intermediate feature map.

  • num_classes (int) – Number of categories excluding the background category.

  • loss_center_heatmap (ConfigDict or dict) – Config of center heatmap loss. Defaults to dict(type=’GaussianFocalLoss’, loss_weight=1.0)

  • loss_wh (ConfigDict or dict) – Config of wh loss. Defaults to dict(type=’L1Loss’, loss_weight=0.1).

  • loss_offset (ConfigDict or dict) – Config of offset loss. Defaults to dict(type=’L1Loss’, loss_weight=1.0).

  • train_cfg (ConfigDict or dict, optional) – Training config. Useless in CenterNet, but we keep this variable for SingleStageDetector.

  • test_cfg (ConfigDict or dict, optional) – Testing config of CenterNet.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization

config dict.

forward(x: Tuple[torch.Tensor, ...])Tuple[List[torch.Tensor]][source]

Forward features. Notice CenterNet head does not use FPN.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

center predict heatmaps for

all levels, the channels number is num_classes.

wh_preds (list[Tensor]): wh predicts for all levels, the channels

number is 2.

offset_preds (list[Tensor]): offset predicts for all levels, the

channels number is 2.

Return type

center_heatmap_preds (list[Tensor])

forward_single(x: torch.Tensor)Tuple[torch.Tensor, ...][source]

Forward feature of a single level.

Parameters

x (Tensor) – Feature of a single level.

Returns

center predict heatmaps, the

channels number is num_classes.

wh_pred (Tensor): wh predicts, the channels number is 2. offset_pred (Tensor): offset predicts, the channels number is 2.

Return type

center_heatmap_pred (Tensor)

get_targets(gt_bboxes: List[torch.Tensor], gt_labels: List[torch.Tensor], feat_shape: tuple, img_shape: tuple)Tuple[dict, int][source]

Compute regression and classification targets in multiple images.

Parameters
  • gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[Tensor]) – class indices corresponding to each box.

  • feat_shape (tuple) – feature map shape with value [B, _, H, W]

  • img_shape (tuple) – image shape.

Returns

The float value is mean avg_factor, the dict has components below:

  • center_heatmap_target (Tensor): targets of center heatmap, shape (B, num_classes, H, W).

  • wh_target (Tensor): targets of wh predict, shape (B, 2, H, W).

  • offset_target (Tensor): targets of offset predict, shape (B, 2, H, W).

  • wh_offset_target_weight (Tensor): weights of wh and offset predict, shape (B, 2, H, W).

Return type

tuple[dict, float]

init_weights()None[source]

Initialize weights of the head.

loss_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Compute losses of the head.

Parameters
  • center_heatmap_preds (list[Tensor]) – center predict heatmaps for all levels with shape (B, num_classes, H, W).

  • wh_preds (list[Tensor]) – wh predicts for all levels with shape (B, 2, H, W).

  • offset_preds (list[Tensor]) – offset predicts for all levels with shape (B, 2, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

which has components below:
  • loss_center_heatmap (Tensor): loss of center heatmap.

  • loss_wh (Tensor): loss of hw heatmap

  • loss_offset (Tensor): loss of offset heatmap.

Return type

dict[str, Tensor]

predict_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_img_metas: Optional[List[dict]] = None, rescale: bool = True, with_nms: bool = False)List[mmengine.structures.instance_data.InstanceData][source]

Transform network output for a batch into bbox predictions.

Parameters
  • center_heatmap_preds (list[Tensor]) – Center predict heatmaps for all levels with shape (B, num_classes, H, W).

  • wh_preds (list[Tensor]) – WH predicts for all levels with shape (B, 2, H, W).

  • offset_preds (list[Tensor]) – Offset predicts for all levels with shape (B, 2, H, W).

  • batch_img_metas (list[dict], optional) – Batch image meta info. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to True.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to False.

Returns

Instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type

list[InstanceData]

class mmdet.models.dense_heads.CenterNetUpdateHead(num_classes: int, in_channels: int, regress_ranges: Sequence[Tuple[int, int]] = ((0, 80), (64, 160), (128, 320), (256, 640), (512, 1000000000)), hm_min_radius: int = 4, hm_min_overlap: float = 0.8, more_pos_thresh: float = 0.2, more_pos_topk: int = 9, soft_weight_on_reg: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'neg_weight': 0.75, 'pos_weight': 0.25, 'type': 'GaussianFocalLoss'}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'GIoULoss'}, norm_cfg: Optional