Shortcuts

mmdet.apis

async mmdet.apis.async_inference_detector(model, imgs)[源代码]

Async inference image(s) with the detector.

参数
  • model (nn.Module) – The loaded detector.

  • img (str | ndarray) – Either image files or loaded images.

返回

Awaitable detection results.

mmdet.apis.inference_detector(model: torch.nn.modules.module.Module, imgs: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]], test_pipeline: Optional[mmcv.transforms.wrappers.Compose] = None)Union[mmdet.structures.det_data_sample.DetDataSample, List[mmdet.structures.det_data_sample.DetDataSample]][源代码]

Inference image(s) with the detector.

参数
  • model (nn.Module) – The loaded detector.

  • imgs (str, ndarray, Sequence[str/ndarray]) – Either image files or loaded images.

  • test_pipeline (Compose) – Test pipeline.

返回

If imgs is a list or tuple, the same length list type results will be returned, otherwise return the detection results directly.

返回类型

DetDataSample or list[DetDataSample]

mmdet.apis.init_detector(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, palette: str = 'none', device: str = 'cuda:0', cfg_options: Optional[dict] = None)torch.nn.modules.module.Module[源代码]

Initialize a detector from config file.

参数
  • config (str, Path, or mmengine.Config) – Config file path, Path, or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • palette (str) – Color palette used for visualization. If palette is stored in checkpoint, use checkpoint’s palette first, otherwise use externally passed palette. Currently, supports ‘coco’, ‘voc’, ‘citys’ and ‘random’. Defaults to none.

  • device (str) – The device where the anchors will be put on. Defaults to cuda:0.

  • cfg_options (dict, optional) – Options to override some settings in the used config.

返回

The constructed detector.

返回类型

nn.Module

mmdet.datasets

datasets

class mmdet.datasets.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[源代码]

A sampler wrapper for grouping images with similar aspect ratio (< 1 or.

>= 1) into a same batch.

参数
  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch.

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size.

class mmdet.datasets.BaseDetDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

Base dataset for detection.

参数
  • proposal_file (str, optional) – Proposals file path. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

full_init()None[源代码]

Load annotation file and set BaseDataset._fully_initialized to True.

If lazy_init=False, full_init will be called during the instantiation and self._fully_initialized will be set to True. If obj._fully_initialized=False, the class method decorated by force_full_init will call full_init automatically.

Several steps to initialize annotation:

  • load_data_list: Load annotations from annotation file.

  • load_proposals: Load proposals from proposal file, if self.proposal_file is not None.

  • filter data information: Filter annotations according to filter_cfg.

  • slice_data: Slice dataset according to self._indices

  • serialize_data: Serialize self.data_list if

self.serialize_data is True.

get_cat_ids(idx: int)List[int][源代码]

Get COCO category ids by index.

参数

idx (int) – Index of data.

返回

All categories in the image of specified index.

返回类型

List[int]

load_proposals()None[源代码]

Load proposals from proposals file.

The proposals_list should be a dict[img_path: proposals] with the same length as data_list. And the proposals should be a dict or InstanceData usually contains following keys.

  • bboxes (np.ndarry): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • scores (np.ndarry): Classification scores, has a shape (num_instance, ).

class mmdet.datasets.CityscapesDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

Dataset for Cityscapes.

filter_data()List[dict][源代码]

Filter annotations according to filter_cfg.

返回

Filtered results.

返回类型

List[dict]

class mmdet.datasets.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[源代码]

Sampler that restricts data loading to the label of the dataset.

A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py

参数
  • dataset – Dataset used for sampling.

  • seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.

  • num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.

get_cat2imgs()Dict[int, list][源代码]

Get a dict with class as key and img_ids as values.

返回

A dict of per-label image list, the item of the dict indicates a label index, corresponds to the image index that contains the label.

返回类型

dict[int, list]

set_epoch(epoch: int)None[源代码]

Sets the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

参数

epoch (int) – Epoch number.

class mmdet.datasets.CocoDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

Dataset for COCO.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCO

filter_data()List[dict][源代码]

Filter annotations according to filter_cfg.

返回

Filtered results.

返回类型

List[dict]

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

parse_data_info(raw_data_info: dict)Union[dict, List[dict]][源代码]

Parse raw annotation to target format.

参数

raw_data_info (dict) – Raw data information load from ann_file

返回

Parsed annotation.

返回类型

Union[dict, List[dict]]

class mmdet.datasets.CocoPanopticDataset(ann_file: str = '', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'ann': None, 'img': None, 'seg': None}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

Coco dataset for Panoptic segmentation.

The annotation format is shown as follows. The ann field is optional for testing.

[
    {
        'filename': f'{image_id:012}.png',
        'image_id':9
        'segments_info':
        [
            {
                'id': 8345037, (segment_id in panoptic png,
                                convert from rgb)
                'category_id': 51,
                'iscrowd': 0,
                'bbox': (x1, y1, w, h),
                'area': 24315
            },
            ...
        ]
    },
    ...
]
参数
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img=None, ann=None, seg=None). The prefix seg which is for panoptic segmentation map must be not None.

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

COCOAPI

alias of mmdet.datasets.api_wrappers.coco_api.COCOPanoptic

filter_data()List[dict][源代码]

Filter images too small or without ground truth.

返回

self.data_list after filtering.

返回类型

List[dict]

parse_data_info(raw_data_info: dict)dict[源代码]

Parse raw annotation to target format.

参数

raw_data_info (dict) – Raw data information load from ann_file.

返回

Parsed annotation.

返回类型

dict

class mmdet.datasets.CrowdHumanDataset(data_root, ann_file, extra_ann_file=None, **kwargs)[源代码]

Dataset for CrowdHuman.

参数
  • data_root (str) – The root directory for data_prefix and ann_file.

  • ann_file (str) – Annotation file path.

  • extra_ann_file (str | optional) – The path of extra image metas for CrowdHuman. It can be created by CrowdHumanDataset automatically or by tools/misc/get_crowdhuman_id_hw.py manually. Defaults to None.

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

parse_data_info(raw_data_info: dict)Union[dict, List[dict]][源代码]

Parse raw annotation to target format.

参数

raw_data_info (dict) – Raw data information load from ann_file

返回

Parsed annotation.

返回类型

Union[dict, List[dict]]

class mmdet.datasets.DeepFashionDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

Dataset for DeepFashion.

class mmdet.datasets.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[源代码]

Group Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets but the same group to form batches.

参数
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

mmdet.datasets.LVISDataset

alias of mmdet.datasets.lvis.LVISV05Dataset

class mmdet.datasets.LVISV05Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

LVIS v0.5 dataset for detection.

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

class mmdet.datasets.LVISV1Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[源代码]

LVIS v1 dataset for detection.

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

class mmdet.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, dict], pipeline: Sequence[str], skip_type_keys: Optional[Sequence[str]] = None, max_refetch: int = 15, lazy_init: bool = False)[源代码]

A wrapper of multiple images mixed dataset.

Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process. At the same time, we provide the dynamic_scale parameter to dynamically change the output image size.

参数
  • dataset (CustomDataset) – The dataset to be mixed.

  • pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • dynamic_scale (tuple[int], optional) – The image scale can be changed dynamically. Default to None. It is deprecated.

  • skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Default: 15.

full_init()[源代码]

Loop to full_init each dataset.

get_data_info(idx: int)dict[源代码]

Get annotation by index.

参数

idx (int) – Global index of ConcatDataset.

返回

The idx-th annotation of the datasets.

返回类型

dict

property metainfo: dict

Get the meta information of the multi-image-mixed dataset.

返回

The meta information of multi-image-mixed dataset.

返回类型

dict

update_skip_type_keys(skip_type_keys)[源代码]

Update skip_type_keys. It is called by an external hook.

参数

skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.

class mmdet.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[源代码]

Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets to form batches.

参数
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

实际案例

>>> dataset_type = 'ConcatDataset'
>>> sub_dataset_type = 'CocoDataset'
>>> data_root = 'data/coco/'
>>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json'
>>> unsup_ann = '../coco_semi_annos/' \
>>>             'instances_train2017.1@10-unlabeled.json'
>>> dataset = dict(type=dataset_type,
>>>     datasets=[
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=sup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=sup_pipeline),
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=unsup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=unsup_pipeline),
>>>         ])
>>>     train_dataloader = dict(
>>>         batch_size=5,
>>>         num_workers=5,
>>>         persistent_workers=True,
>>>         sampler=dict(type='MultiSourceSampler',
>>>             batch_size=5, source_ratio=[1, 4]),
>>>         batch_sampler=None,
>>>         dataset=dataset)
set_epoch(epoch: int)None[源代码]

Not supported in `epoch-based runner.

class mmdet.datasets.OpenImagesChallengeDataset(ann_file: str, **kwargs)[源代码]

Open Images Challenge dataset for detection.

参数

ann_file (str) – Open Images Challenge box annotation in txt format.

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

class mmdet.datasets.OpenImagesDataset(label_file: str, meta_file: str, hierarchy_file: str, image_level_ann_file: Optional[str] = None, **kwargs)[源代码]

Open Images dataset for detection.

参数
  • ann_file (str) – Annotation file path.

  • label_file (str) – File path of the label description file that maps the classes names in MID format to their short descriptions.

  • meta_file (str) – File path to get image metas.

  • hierarchy_file (str) – The file path of the class hierarchy.

  • image_level_ann_file (str) – Human-verified image level annotation, which is used in evaluation.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

load_data_list()List[dict][源代码]

Load annotations from an annotation file named as self.ann_file

返回

A list of annotation.

返回类型

List[dict]

class mmdet.datasets.VOCDataset(**kwargs)[源代码]

Dataset for PASCAL VOC.

class mmdet.datasets.WIDERFaceDataset(**kwargs)[源代码]

Reader for the WIDER Face dataset in PASCAL VOC format.

Conversion scripts can be found in https://github.com/sovrasov/wider-face-pascal-voc-annotations

load_annotations(ann_file)[源代码]

Load annotation from WIDERFace XML style annotation file.

参数

ann_file (str) – Path of XML file.

返回

Annotation info from XML file.

返回类型

list[dict]

class mmdet.datasets.XMLDataset(img_subdir: str = 'JPEGImages', ann_subdir: str = 'Annotations', **kwargs)[源代码]

XML dataset for detection.

参数
  • img_subdir (str) – Subdir where images are stored. Default: JPEGImages.

  • ann_subdir (str) – Subdir where annotations are. Default: Annotations.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

property bbox_min_size: Optional[str]

Return the minimum size of bounding boxes in the images.

filter_data()List[dict][源代码]

Filter annotations according to filter_cfg.

返回

Filtered results.

返回类型

List[dict]

load_data_list()List[dict][源代码]

Load annotation from XML style ann_file.

返回

Annotation info from XML file.

返回类型

list[dict]

parse_data_info(img_info: dict)Union[dict, List[dict]][源代码]

Parse raw annotation to target format.

参数

img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.

返回

Parsed annotation.

返回类型

Union[dict, List[dict]]

property sub_data_root: str

Return the sub data root.

mmdet.datasets.get_loading_pipeline(pipeline)[源代码]

Only keep loading image and annotations related configuration.

参数

pipeline (list[dict]) – Data pipeline configs.

返回

The new pipeline list with only keep

loading image and annotations related configuration.

返回类型

list[dict]

实际案例

>>> pipelines = [
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations', with_bbox=True),
...    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
...    dict(type='RandomFlip', flip_ratio=0.5),
...    dict(type='Normalize', **img_norm_cfg),
...    dict(type='Pad', size_divisor=32),
...    dict(type='DefaultFormatBundle'),
...    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
...    ]
>>> expected_pipelines = [
...    dict(type='LoadImageFromFile'),
...    dict(type='LoadAnnotations', with_bbox=True)
...    ]
>>> assert expected_pipelines ==        ...        get_loading_pipeline(pipelines)

api_wrappers

class mmdet.datasets.api_wrappers.COCO(*args: Any, **kwargs: Any)[源代码]

This class is almost the same as official pycocotools package.

It implements some snake case function aliases. So that the COCO class has the same interface as LVIS class.

class mmdet.datasets.api_wrappers.COCOPanoptic(*args: Any, **kwargs: Any)[源代码]

This wrapper is for loading the panoptic style annotation file.

The format is shown in the CocoPanopticDataset class.

参数

annotation_file (str, optional) – Path of annotation file. Defaults to None.

createIndex()None[源代码]

Create index.

load_anns(ids: Union[List[int], int] = [])Optional[List[dict]][源代码]

Load anns with the specified ids.

self.anns is a list of annotation lists instead of a list of annotations.

参数

ids (Union[List[int], int]) – Integer ids specifying anns.

返回

Loaded ann objects.

返回类型

anns (List[dict], optional)

samplers

class mmdet.datasets.samplers.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[源代码]

A sampler wrapper for grouping images with similar aspect ratio (< 1 or.

>= 1) into a same batch.

参数
  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch.

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size.

class mmdet.datasets.samplers.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[源代码]

Sampler that restricts data loading to the label of the dataset.

A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks

The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py

参数
  • dataset – Dataset used for sampling.

  • seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.

  • num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.

get_cat2imgs()Dict[int, list][源代码]

Get a dict with class as key and img_ids as values.

返回

A dict of per-label image list, the item of the dict indicates a label index, corresponds to the image index that contains the label.

返回类型

dict[int, list]

set_epoch(epoch: int)None[源代码]

Sets the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

参数

epoch (int) – Epoch number.

class mmdet.datasets.samplers.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[源代码]

Group Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets but the same group to form batches.

参数
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

class mmdet.datasets.samplers.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[源代码]

Multi-Source Infinite Sampler.

According to the sampling ratio, sample data from different datasets to form batches.

参数
  • dataset (Sized) – The dataset.

  • batch_size (int) – Size of mini-batch.

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.

实际案例

>>> dataset_type = 'ConcatDataset'
>>> sub_dataset_type = 'CocoDataset'
>>> data_root = 'data/coco/'
>>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json'
>>> unsup_ann = '../coco_semi_annos/' \
>>>             'instances_train2017.1@10-unlabeled.json'
>>> dataset = dict(type=dataset_type,
>>>     datasets=[
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=sup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=sup_pipeline),
>>>         dict(
>>>             type=sub_dataset_type,
>>>             data_root=data_root,
>>>             ann_file=unsup_ann,
>>>             data_prefix=dict(img='train2017/'),
>>>             filter_cfg=dict(filter_empty_gt=True, min_size=32),
>>>             pipeline=unsup_pipeline),
>>>         ])
>>>     train_dataloader = dict(
>>>         batch_size=5,
>>>         num_workers=5,
>>>         persistent_workers=True,
>>>         sampler=dict(type='MultiSourceSampler',
>>>             batch_size=5, source_ratio=[1, 4]),
>>>         batch_sampler=None,
>>>         dataset=dataset)
set_epoch(epoch: int)None[源代码]

Not supported in `epoch-based runner.

transforms

class mmdet.datasets.transforms.Albu(transforms: List[dict], bbox_params: Optional[dict] = None, keymap: Optional[dict] = None, skip_img_without_anno: bool = False)[源代码]

Albumentation augmentation.

Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

Modified Keys:

  • img (np.uint8)

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • img_shape (tuple)

An example of transforms is as followed:

[
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.0625,
        scale_limit=0.0,
        rotate_limit=0,
        interpolation=1,
        p=0.5),
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]
参数
  • transforms (list[dict]) – A list of albu transformations

  • bbox_params (dict, optional) – Bbox_params for albumentation Compose

  • keymap (dict, optional) – Contains {‘input key’:’albumentation-style key’}

  • skip_img_without_anno (bool) – Whether to skip the image if no ann left after aug. Defaults to False.

albu_builder(cfg: dict)None[源代码]

Import a module from albumentations.

It inherits some of build_from_cfg() logic.

参数

cfg (dict) – Config dict. It should at least contain the key “type”.

返回

The constructed object.

返回类型

obj

static mapper(d: dict, keymap: dict)dict[源代码]

Dictionary mapper. Renames keys according to keymap provided.

参数
  • d (dict) – old dict

  • keymap (dict) – {‘old_key’:’new_key’}

返回

new dict.

返回类型

dict

class mmdet.datasets.transforms.AutoAugment(policies: List[List[Union[dict, mmengine.config.config.ConfigDict]]] = [[{'type': 'Equalize', 'prob': 0.8, 'level': 1}, {'type': 'ShearY', 'prob': 0.8, 'level': 4}], [{'type': 'Color', 'prob': 0.4, 'level': 9}, {'type': 'Equalize', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 1}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Solarize', 'prob': 0.8, 'level': 3}, {'type': 'Equalize', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.4, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 2}], [{'type': 'Color', 'prob': 0.2, 'level': 0}, {'type': 'Equalize', 'prob': 0.8, 'level': 8}], [{'type': 'Equalize', 'prob': 0.4, 'level': 8}, {'type': 'SolarizeAdd', 'prob': 0.8, 'level': 3}], [{'type': 'ShearX', 'prob': 0.2, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Color', 'prob': 0.6, 'level': 1}, {'type': 'Equalize', 'prob': 1.0, 'level': 2}], [{'type': 'Invert', 'prob': 0.4, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 0}], [{'type': 'Equalize', 'prob': 1.0, 'level': 9}, {'type': 'ShearY', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 7}, {'type': 'Equalize', 'prob': 0.6, 'level': 0}], [{'type': 'Posterize', 'prob': 0.4, 'level': 6}, {'type': 'AutoContrast', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Color', 'prob': 0.6, 'level': 9}], [{'type': 'Solarize', 'prob': 0.2, 'level': 4}, {'type': 'Rotate', 'prob': 0.8, 'level': 9}], [{'type': 'Rotate', 'prob': 1.0, 'level': 7}, {'type': 'TranslateY', 'prob': 0.8, 'level': 9}], [{'type': 'ShearX', 'prob': 0.0, 'level': 0}, {'type': 'Solarize', 'prob': 0.8, 'level': 4}], [{'type': 'ShearY', 'prob': 0.8, 'level': 0}, {'type': 'Color', 'prob': 0.6, 'level': 4}], [{'type': 'Color', 'prob': 1.0, 'level': 0}, {'type': 'Rotate', 'prob': 0.6, 'level': 2}], [{'type': 'Equalize', 'prob': 0.8, 'level': 4}, {'type': 'Equalize', 'prob': 0.0, 'level': 8}], [{'type': 'Equalize', 'prob': 1.0, 'level': 4}, {'type': 'AutoContrast', 'prob': 0.6, 'level': 2}], [{'type': 'ShearY', 'prob': 0.4, 'level': 7}, {'type': 'SolarizeAdd', 'prob': 0.6, 'level': 7}], [{'type': 'Posterize', 'prob': 0.8, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 10}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Equalize', 'prob': 0.6, 'level': 1}], [{'type': 'Color', 'prob': 0.8, 'level': 6}, {'type': 'Rotate', 'prob': 0.4, 'level': 5}]], prob: Optional[List[float]] = None)[源代码]

Auto augmentation.

This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data and in Learning Data Augmentation Strategies for Object Detection.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • policies (List[List[Union[dict, ConfigDict]]]) – The policies of auto augmentation.Each policy in policies is a specific augmentation policy, and is composed by several augmentations. When AutoAugment is called, a random policy in policies will be selected to augment images. Defaults to policy_v0().

  • prob (list[float], optional) – The probabilities associated with each policy. The length should be equal to the policy number and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.

实际案例

>>> policies = [
>>>     [
>>>         dict(type='Sharpness', prob=0.0, level=8),
>>>         dict(type='ShearX', prob=0.4, level=0,)
>>>     ],
>>>     [
>>>         dict(type='Rotate', prob=0.6, level=10),
>>>         dict(type='Color', prob=1.0, level=6)
>>>     ]
>>> ]
>>> augmentation = AutoAugment(policies)
>>> img = np.ones(100, 100, 3)
>>> gt_bboxes = np.ones(10, 4)
>>> results = dict(img=img, gt_bboxes=gt_bboxes)
>>> results = augmentation(results)
class mmdet.datasets.transforms.AutoContrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Auto adjust image contrast.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing AutoContrast should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – No use for AutoContrast transformation. Defaults to None.

  • min_mag (float) – No use for AutoContrast transformation. Defaults to 0.1.

  • max_mag (float) – No use for AutoContrast transformation. Defaults to 1.9.

class mmdet.datasets.transforms.Brightness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Adjust the brightness of the image. A magnitude=0 gives a black image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Brightness transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Brightness transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Brightness transformation. Defaults to 1.9.

class mmdet.datasets.transforms.CachedMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True, max_cached_images: int = 20, random_pop: bool = True, prob: float = 1.0)[源代码]

Cached mixup data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The cached mixup transform steps are as follows:

   1. Append the results from the last transform into the cache.
   2. Another random image is picked from the cache and embedded in
      the top left patch(after padding and resizing)
   3. The target of mixup transform is the weighted average of mixup
      image and origin image.

Required Keys:

  • img

  • gt_bboxes (np.float32) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

参数
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

class mmdet.datasets.transforms.CachedMosaic(*args, max_cached_images: int = 40, random_pop: bool = True, **kwargs)[源代码]

Cached mosaic augmentation.

Cached mosaic transform will random select images from the cache and combine them into one output image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The cached mosaic transform steps are as follows:

    1. Append the results from the last transform into the cache.
    2. Choose the mosaic center as the intersections of 4 images
    3. Get the left top image according to the index, and randomly
       sample another 3 images from the result cache.
    4. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (np.float32) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

参数
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

class mmdet.datasets.transforms.Color(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Adjust the color balance of the image, in a manner similar to the controls on a colour TV set. A magnitude=0 gives a black & white image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Color transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Color transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Color transformation. Defaults to 1.9.

class mmdet.datasets.transforms.ColorTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Base class for color transformations. All color transformations need to inherit from this base class. ColorTransform unifies the class attributes and class functions of color transformations (Color, Brightness, Contrast, Sharpness, Solarize, SolarizeAdd, Equalize, AutoContrast, Invert, and Posterize), and only distort color channels, without impacting the locations of the instances.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for color transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for color transformation. Defaults to 1.9.

transform(results: dict)dict[源代码]

Transform function for images.

参数

results (dict) – Result dict from loading pipeline.

返回

Transformed results.

返回类型

dict

class mmdet.datasets.transforms.Contrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Control the contrast of the image. A magnitude=0 gives a gray image, whereas magnitude=1 gives the original imageThe bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Contrast transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Contrast transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Contrast transformation. Defaults to 1.9.

class mmdet.datasets.transforms.CopyPaste(max_num_pasted: int = 100, bbox_occluded_thr: int = 10, mask_occluded_thr: int = 300, selected: bool = True)[源代码]

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation The simple copy-paste transform steps are as follows:

  1. The destination image is already resized with aspect ratio kept, cropped and padded.

  2. Randomly select a source image, which is also already resized with aspect ratio kept, cropped and padded in a similar way as the destination image.

  3. Randomly select some objects from the source image.

  4. Paste these source objects to the destination image directly, due to the source and destination image have the same size.

  5. Update object masks of the destination image, for some origin objects may be occluded.

  6. Generate bboxes from the updated destination masks and filter some objects which are totally occluded, and adjust bboxes which are partly occluded.

  7. Append selected source bboxes, masks, and labels.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (BitmapMasks) (optional)

Modified Keys:

  • img

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • gt_masks (optional)

参数
  • max_num_pasted (int) – The maximum number of pasted objects. Defaults to 100.

  • bbox_occluded_thr (int) – The threshold of occluded bbox. Defaults to 10.

  • mask_occluded_thr (int) – The threshold of occluded mask. Defaults to 300.

  • selected (bool) – Whether select objects or not. If select is False, all objects of the source image will be pasted to the destination image. Defaults to True.

class mmdet.datasets.transforms.CutOut(n_holes: Union[int, Tuple[int, int]], cutout_shape: Optional[Union[Tuple[int, int], List[Tuple[int, int]]]] = None, cutout_ratio: Optional[Union[Tuple[float, float], List[Tuple[float, float]]]] = None, fill_in: Union[Tuple[float, float, float], Tuple[int, int, int]] = (0, 0, 0))[源代码]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • n_holes (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] or list[tuple[int, int]], optional) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list. Defaults to None.

  • (tuple[float (cutout_ratio) – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

  • or list[tuple[float (float]) – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

  • float]] – optional): The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

:paramoptional): The candidate ratio of dropped regions. It can be

tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time. Defaults to None.

参数

fill_in (tuple[float, float, float] or tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Defaults to (0, 0, 0).

class mmdet.datasets.transforms.Equalize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Equalize the image histogram. The bboxes, masks and segmentations are not modified.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Equalize transformation. Defaults to 1.0.

  • level (int, optional) – No use for Equalize transformation. Defaults to None.

  • min_mag (float) – No use for Equalize transformation. Defaults to 0.1.

  • max_mag (float) – No use for Equalize transformation. Defaults to 1.9.

class mmdet.datasets.transforms.Expand(mean: Sequence[Union[int, float]] = (0, 0, 0), to_rgb: bool = True, ratio_range: Sequence[Union[int, float]] = (1, 4), seg_ignore_label: Optional[int] = None, prob: float = 0.5)[源代码]

Random expand the image & bboxes & masks & segmentation map.

Randomly place the original image on a canvas of ratio x original image size filled with mean values. The ratio is in the range of ratio_range.

Required Keys:

  • img

  • img_shape

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

参数
  • mean (sequence) – mean value of dataset.

  • to_rgb (bool) – if need to convert the order of mean to align with RGB.

  • ratio_range (sequence)) – range of expand ratio.

  • seg_ignore_label (int) – label of ignore segmentation map.

  • prob (float) – probability of applying this transformation

class mmdet.datasets.transforms.FilterAnnotations(min_gt_bbox_wh: Tuple[int, int] = (1, 1), min_gt_mask_area: int = 1, by_box: bool = True, by_mask: bool = False, keep_empty: bool = True)[源代码]

Filter invalid annotations.

Required Keys:

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_masks (optional)

  • gt_ignore_flags (optional)

参数
  • min_gt_bbox_wh (tuple[float]) – Minimum width and height of ground truth boxes. Default: (1., 1.)

  • min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1

  • by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True

  • by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False

  • keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Defaults to True.

class mmdet.datasets.transforms.FixShapeResize(width: int, height: int, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation: str = 'bilinear')[源代码]

Resize images & bbox & seg to the specified size.

This transform resizes the input image according to width and height. Bboxes, masks, and seg map are then resized with the same parameters.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

  • homography_matrix

参数
  • width (int) – width for resizing.

  • height (int) – height for resizing. Defaults to None.

  • pad_val (Number | dict[str, Number], optional) –

    Padding value for if the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:

    • img: The value to pad the image.

    • seg: The value to pad the semantic segmentation map.

    Defaults to dict(img=0, seg=255).

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.GeomTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 1.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Base class for geometric transformations. All geometric transformations need to inherit from this base class. GeomTransform unifies the class attributes and class functions of geometric transformations (ShearX, ShearY, Rotate, TranslateX, and TranslateY), and records the homography matrix.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for geometric transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for geometric transformation. Defaults to 1.0.

  • reversal_prob (float) – The probability that reverses the geometric transformation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.ImageToTensor(keys)[源代码]

Convert image to torch.Tensor by given keys.

The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).

参数

keys (Sequence[str]) – Key of images to be converted to Tensor.

class mmdet.datasets.transforms.InstaBoost(action_candidate: tuple = ('normal', 'horizontal', 'skip'), action_prob: tuple = (1, 0, 0), scale: tuple = (0.8, 1.2), dx: int = 15, dy: int = 15, theta: tuple = (- 1, 1), color_prob: float = 0.5, hflag: bool = False, aug_ratio: float = 0.5)[源代码]

Data augmentation method in InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting.

Refer to https://github.com/GothicAi/Instaboost for implementation details.

Required Keys:

  • img (np.uint8)

  • instances

Modified Keys:

  • img (np.uint8)

  • instances

参数
  • action_candidate (tuple) – Action candidates. “normal”, “horizontal”, “vertical”, “skip” are supported. Defaults to (‘normal’, ‘horizontal’, ‘skip’).

  • action_prob (tuple) – Corresponding action probabilities. Should be the same length as action_candidate. Defaults to (1, 0, 0).

  • scale (tuple) – (min scale, max scale). Defaults to (0.8, 1.2).

  • dx (int) – The maximum x-axis shift will be (instance width) / dx. Defaults to 15.

  • dy (int) – The maximum y-axis shift will be (instance height) / dy. Defaults to 15.

  • theta (tuple) – (min rotation degree, max rotation degree). Defaults to (-1, 1).

  • color_prob (float) – Probability of images for color augmentation. Defaults to 0.5.

  • hflag (bool) – Whether to use heatmap guided. Defaults to False.

  • aug_ratio (float) – Probability of applying this transformation. Defaults to 0.5.

transform(results)dict[源代码]

The transform function.

class mmdet.datasets.transforms.Invert(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Invert images.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – No use for Invert transformation. Defaults to None.

  • min_mag (float) – No use for Invert transformation. Defaults to 0.1.

  • max_mag (float) – No use for Invert transformation. Defaults to 1.9.

class mmdet.datasets.transforms.LoadAnnotations(with_mask: bool = False, poly2mask: bool = True, box_type: str = 'hbox', **kwargs)[源代码]

Load and process the instances and seg_map annotation provided by dataset.

The annotation format is as the following:

{
    'instances':
    [
        {
        # List of 4 numbers representing the bounding box of the
        # instance, in (x1, y1, x2, y2) order.
        'bbox': [x1, y1, x2, y2],

        # Label of image classification.
        'bbox_label': 1,

        # Used in instance/panoptic segmentation. The segmentation mask
        # of the instance or the information of segments.
        # 1. If list[list[float]], it represents a list of polygons,
        # one for each connected component of the object. Each
        # list[float] is one simple polygon in the format of
        # [x1, y1, ..., xn, yn] (n≥3). The Xs and Ys are absolute
        # coordinates in unit of pixels.
        # 2. If dict, it represents the per-pixel segmentation mask in
        # COCO’s compressed RLE format. The dict should have keys
        # “size” and “counts”.  Can be loaded by pycocotools
        'mask': list[list[float]] or dict,

        }
    ]
    # Filename of semantic or panoptic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # In (x1, y1, x2, y2) order, float type. N is the number of bboxes
    # in an image
    'gt_bboxes': BaseBoxes(N, 4)
     # In int type.
    'gt_bboxes_labels': np.ndarray(N, )
     # In built-in class
    'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W)
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
     # in (x, y, v) order, float type.
}

Required Keys:

  • height

  • width

  • instances

    • bbox (optional)

    • bbox_label

    • mask (optional)

    • ignore_flag

  • seg_map_path (optional)

Added Keys:

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_masks (BitmapMasks | PolygonMasks)

  • gt_seg_map (np.uint8)

  • gt_ignore_flags (bool)

参数
  • with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.

  • with_label (bool) – Whether to parse and load the label annotation. Defaults to True.

  • with_mask (bool) – Whether to parse and load the mask annotation. Default: False.

  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • poly2mask (bool) – Whether to convert mask to bitmap. Default: True.

  • box_type (str) – The box type used to wrap the bboxes. If box_type is None, gt_bboxes will keep being np.ndarray. Defaults to ‘hbox’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See :class:mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[源代码]

Function to load multiple types annotations.

参数

results (dict) – Result dict from :obj:mmengine.BaseDataset.

返回

The dict contains loaded bounding box, label and semantic segmentation.

返回类型

dict

class mmdet.datasets.transforms.LoadEmptyAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = False, with_seg: bool = False, seg_ignore_label: int = 255)[源代码]

Load Empty Annotations for unlabeled images.

Added Keys: - gt_bboxes (np.float32) - gt_bboxes_labels (np.int64) - gt_masks (BitmapMasks | PolygonMasks) - gt_seg_map (np.uint8) - gt_ignore_flags (bool)

参数
  • with_bbox (bool) – Whether to load the pseudo bbox annotation. Defaults to True.

  • with_label (bool) – Whether to load the pseudo label annotation. Defaults to True.

  • with_mask (bool) – Whether to load the pseudo mask annotation. Default: False.

  • with_seg (bool) – Whether to load the pseudo semantic segmentation annotation. Defaults to False.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

transform(results: dict)dict[源代码]

Transform function to load empty annotations.

参数

results (dict) – Result dict.

返回

Updated result dict.

返回类型

dict

class mmdet.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'}, ignore_empty: bool = False)[源代码]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

参数

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[源代码]

Transform function to add image meta information.

参数

results (dict) – Result dict with Webcam read image in results['img'].

返回

The dict contains loaded image and meta information.

返回类型

dict

class mmdet.datasets.transforms.LoadMultiChannelImageFromFiles(to_float32: bool = False, color_type: str = 'unchanged', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[源代码]

Load multi-channel images from a list of separate channel files.

Required Keys:

  • img_path

Modified Keys:

  • img

  • img_shape

  • ori_shape

参数
  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘unchanged’.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[源代码]

Transform functions to load multiple images and get images meta information.

参数

results (dict) – Result dict from mmdet.CustomDataset.

返回

The dict contains loaded images and meta information.

返回类型

dict

class mmdet.datasets.transforms.LoadPanopticAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = True, with_seg: bool = True, box_type: str = 'hbox', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[源代码]

Load multiple types of panoptic annotations.

The annotation format is as the following:

{
    'instances':
    [
        {
        # List of 4 numbers representing the bounding box of the
        # instance, in (x1, y1, x2, y2) order.
        'bbox': [x1, y1, x2, y2],

        # Label of image classification.
        'bbox_label': 1,
        },
        ...
    ]
    'segments_info':
    [
        {
        # id = cls_id + instance_id * INSTANCE_OFFSET
        'id': int,

        # Contiguous category id defined in dataset.
        'category': int

        # Thing flag.
        'is_thing': bool
        },
        ...
    ]

    # Filename of semantic or panoptic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # In (x1, y1, x2, y2) order, float type. N is the number of bboxes
    # in an image
    'gt_bboxes': BaseBoxes(N, 4)
     # In int type.
    'gt_bboxes_labels': np.ndarray(N, )
     # In built-in class
    'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W)
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
     # in (x, y, v) order, float type.
}

Required Keys:

  • height

  • width

  • instances - bbox - bbox_label - ignore_flag

  • segments_info - id - category - is_thing

  • seg_map_path

Added Keys:

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_masks (BitmapMasks | PolygonMasks)

  • gt_seg_map (np.uint8)

  • gt_ignore_flags (bool)

参数
  • with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.

  • with_label (bool) – Whether to parse and load the label annotation. Defaults to True.

  • with_mask (bool) – Whether to parse and load the mask annotation. Defaults to True.

  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • box_type (str) – The box mode used to wrap the bboxes.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘cv2’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See :class:mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

transform(results: dict)dict[源代码]

Function to load multiple types panoptic annotations.

参数

results (dict) – Result dict from :obj:mmdet.CustomDataset.

返回

The dict contains loaded bounding box, label, mask and

semantic segmentation annotations.

返回类型

dict

class mmdet.datasets.transforms.LoadProposals(num_max_proposals: Optional[int] = None)[源代码]

Load proposal pipeline.

Required Keys:

  • proposals

Modified Keys:

  • proposals

参数

num_max_proposals (int, optional) – Maximum number of proposals to load. If not specified, all proposals will be loaded.

transform(results: dict)dict[源代码]

Transform function to load proposals from file.

参数

results (dict) – Result dict from mmdet.CustomDataset.

返回

The dict contains loaded proposal annotations.

返回类型

dict

class mmdet.datasets.transforms.MinIoURandomCrop(min_ious: Sequence[float] = (0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size: float = 0.3, bbox_clip_border: bool = True)[源代码]

Random crop the image & bboxes & masks & segmentation map, the cropped patches have minimum IoU requirement with original image & bboxes & masks.

& segmentation map, the IoU threshold is randomly selected from min_ious.

Required Keys:

  • img

  • img_shape

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

参数
  • min_ious (Sequence[float]) – minimum IoU threshold for all intersections with bounding boxes.

  • min_crop_size (float) – minimum crop’s size (i.e. h,w := a*h, a*w,

  • a >= min_crop_size) (where) –

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

class mmdet.datasets.transforms.MixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True)[源代码]

MixUp data augmentation.

                    mixup transform
           +------------------------------+
           | mixup image   |              |
           |      +--------|--------+     |
           |      |        |        |     |
           |---------------+        |     |
           |      |                 |     |
           |      |      image      |     |
           |      |                 |     |
           |      |                 |     |
           |      |-----------------+     |
           |             pad              |
           +------------------------------+

The mixup transform steps are as follows:

   1. Another random image is picked by dataset and embedded in
      the top left patch(after padding and resizing)
   2. The target of mixup transform is the weighted average of mixup
      image and origin image.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

参数
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

class mmdet.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, prob: float = 1.0)[源代码]

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

参数
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

class mmdet.datasets.transforms.MultiBranch(branch_field: List[str], **branch_pipelines: dict)[源代码]

Multiple branch pipeline wrapper.

Generate multiple data-augmented versions of the same image. MultiBranch needs to specify the branch names of all pipelines of the dataset, perform corresponding data augmentation for the current branch, and return None for other branches, which ensures the consistency of return format across different samples.

参数
  • branch_field (list) – List of branch names.

  • branch_pipelines (dict) – Dict of different pipeline configs to be composed.

实际案例

>>> branch_field = ['sup', 'unsup_teacher', 'unsup_student']
>>> sup_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=0.5),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> weak_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=0.0),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> strong_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=dict(backend='disk')),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(type='Resize', scale=(1333, 800), keep_ratio=True),
>>>     dict(type='RandomFlip', prob=1.0),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         sup=dict(type='PackDetInputs'))
>>>     ]
>>> unsup_pipeline = [
>>>     dict(type='LoadImageFromFile',
>>>         file_client_args=file_client_args),
>>>     dict(type='LoadEmptyAnnotations'),
>>>     dict(
>>>         type='MultiBranch',
>>>         branch_field=branch_field,
>>>         unsup_teacher=weak_pipeline,
>>>         unsup_student=strong_pipeline)
>>>     ]
>>> from mmcv.transforms import Compose
>>> sup_branch = Compose(sup_pipeline)
>>> unsup_branch = Compose(unsup_pipeline)
>>> print(sup_branch)
>>> Compose(
>>>     LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     Resize(scale=(1333, 800), scale_factor=None, keep_ratio=True, clip_object_border=True), backend=cv2), interpolation=bilinear) # noqa
>>>     RandomFlip(prob=0.5, direction=horizontal)
>>>     MultiBranch(branch_pipelines=['sup'])
>>> )
>>> print(unsup_branch)
>>> Compose(
>>>     LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa
>>>     LoadEmptyAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, seg_ignore_label=255) # noqa
>>>     MultiBranch(branch_pipelines=['unsup_teacher', 'unsup_student'])
>>> )
transform(results: dict)dict[源代码]

Transform function to apply transforms sequentially.

参数

results (dict) – Result dict from loading pipeline.

返回

  • ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of

    models from different branches.

  • ’data_sample’ (Dict[str,obj:DetDataSample]): The annotation

    info of the sample from different branches.

返回类型

dict

class mmdet.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[源代码]

Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_id: id of the image

  • img_path: path to the image file

  • ori_shape: original shape of the image as a tuple (h, w)

  • img_shape: shape of the image input to the network as a tuple (h, w). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

参数

meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: ('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[源代码]

Method to pack the input data.

参数

results (dict) – Result dict from the data pipeline.

返回

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:DetDataSample): The annotation info of the

    sample.

返回类型

dict

class mmdet.datasets.transforms.Pad(size: Optional[Tuple[int, int]] = None, size_divisor: Optional[int] = None, pad_to_square: bool = False, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, padding_mode: str = 'constant')[源代码]

Pad the image & segmentation map.

There are three padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. and (3)pad to square. Also, pad to square and pad to the minimum size can be used as the same time.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_masks

  • gt_seg_map

Added Keys:

  • pad_shape

  • pad_fixed_size

  • pad_size_divisor

参数
  • size (tuple, optional) – Fixed padding size. Expected padding shape (width, height). Defaults to None.

  • size_divisor (int, optional) – The divisor of padded size. Defaults to None.

  • pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Defaults to False.

  • pad_val (Number | dict[str, Number], optional) –

    the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:

    • img: The value to pad the image.

    • seg: The value to pad the semantic segmentation map.

    Defaults to dict(img=0, seg=255).

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Defaults to ‘constant’.

    • constant: pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

transform(results: dict)dict[源代码]

Call function to pad images, masks, semantic segmentation maps.

参数

results (dict) – Result dict from loading pipeline.

返回

Updated result dict.

返回类型

dict

class mmdet.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[Union[int, float]] = (0.5, 1.5), saturation_range: Sequence[Union[int, float]] = (0.5, 1.5), hue_delta: int = 18)[源代码]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

  8. randomly swap channels

Required Keys:

  • img (np.uint8)

Modified Keys:

  • img (np.float32)

参数
  • brightness_delta (int) – delta of brightness.

  • contrast_range (sequence) – range of contrast.

  • saturation_range (sequence) – range of saturation.

  • hue_delta (int) – delta of hue.

transform(results: dict)dict[源代码]

Transform function to perform photometric distortion on images.

参数

results (dict) – Result dict from loading pipeline.

返回

Result dict with images distorted.

返回类型

dict

class mmdet.datasets.transforms.Posterize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 4.0)[源代码]

Posterize images (reduce the number of bits for each color channel).

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Posterize transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Posterize transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for Posterize transformation. Defaults to 4.0.

class mmdet.datasets.transforms.ProposalBroadcaster(transforms: List[Union[dict, Callable]] = [])[源代码]

A transform wrapper to apply the wrapped transforms to process both gt_bboxes and proposals without adding any codes. It will do the following steps:

  1. Scatter the broadcasting targets to a list of inputs of the wrapped transforms. The type of the list should be list[dict, dict], which the first is the original inputs, the second is the processing results that gt_bboxes being rewritten by the proposals.

  2. Apply self.transforms, with same random parameters, which is sharing with a context manager. The type of the outputs is a list[dict, dict].

  3. Gather the outputs, update the proposals in the first item of the outputs with the gt_bboxes in the second .

参数

transforms (list, optional) – Sequence of transform object or config dict to be wrapped. Defaults to [].

Note: The TransformBroadcaster in MMCV can achieve the same operation as

ProposalBroadcaster, but need to set more complex parameters.

实际案例

>>> pipeline = [
>>>     dict(type='LoadImageFromFile'),
>>>     dict(type='LoadProposals', num_max_proposals=2000),
>>>     dict(type='LoadAnnotations', with_bbox=True),
>>>     dict(
>>>         type='ProposalBroadcaster',
>>>         transforms=[
>>>             dict(type='Resize', scale=(1333, 800),
>>>                  keep_ratio=True),
>>>             dict(type='RandomFlip', prob=0.5),
>>>         ]),
>>>     dict(type='PackDetInputs')]
transform(results: dict)dict[源代码]

Apply wrapped transform functions to process both gt_bboxes and proposals.

参数

results (dict) – Result dict from loading pipeline.

返回

Updated result dict.

返回类型

dict

class mmdet.datasets.transforms.RandAugment(aug_space: List[Union[dict, mmengine.config.config.ConfigDict]] = [[{'type': 'AutoContrast'}], [{'type': 'Equalize'}], [{'type': 'Invert'}], [{'type': 'Rotate'}], [{'type': 'Posterize'}], [{'type': 'Solarize'}], [{'type': 'SolarizeAdd'}], [{'type': 'Color'}], [{'type': 'Contrast'}], [{'type': 'Brightness'}], [{'type': 'Sharpness'}], [{'type': 'ShearX'}], [{'type': 'ShearY'}], [{'type': 'TranslateX'}], [{'type': 'TranslateY'}]], aug_num: int = 2, prob: Optional[List[float]] = None)[源代码]

Rand augmentation.

This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_bboxes_labels

  • gt_masks

  • gt_ignore_flags

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • aug_space (List[List[Union[dict, ConfigDict]]]) – The augmentation space of rand augmentation. Each augmentation transform in aug_space is a specific transform, and is composed by several augmentations. When RandAugment is called, a random transform in aug_space will be selected to augment images. Defaults to aug_space.

  • aug_num (int) – Number of augmentation to apply equentially. Defaults to 2.

  • prob (list[float], optional) – The probabilities associated with each augmentation. The length should be equal to the augmentation space and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.

实际案例

>>> aug_space = [
>>>     dict(type='Sharpness'),
>>>     dict(type='ShearX'),
>>>     dict(type='Color'),
>>>     ],
>>> augmentation = RandAugment(aug_space)
>>> img = np.ones(100, 100, 3)
>>> gt_bboxes = np.ones(10, 4)
>>> results = dict(img=img, gt_bboxes=gt_bboxes)
>>> results = augmentation(results)
transform(results: dict)dict[源代码]

Transform function to use RandAugment.

参数

results (dict) – Result dict from loading pipeline.

返回

Result dict with RandAugment.

返回类型

dict

class mmdet.datasets.transforms.RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True)[源代码]

Random affine transform data augmentation.

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

参数
  • max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.

  • max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.

  • scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).

  • max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.

  • border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).

  • border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

class mmdet.datasets.transforms.RandomCenterCropPad(crop_size: Optional[tuple] = None, ratios: Optional[tuple] = (0.9, 1.0, 1.1), border: Optional[int] = 128, mean: Optional[Sequence] = None, std: Optional[Sequence] = None, to_rgb: Optional[bool] = None, test_mode: bool = False, test_pad_mode: Optional[tuple] = ('logical_or', 127), test_pad_add_pix: int = 0, bbox_clip_border: bool = True)[源代码]

Random center crop and random around padding for CornerNet.

This operation generates randomly cropped image from the original image and pads it simultaneously. Different from RandomCrop, the output shape may not equal to crop_size strictly. We choose a random value from ratios and the output shape could be larger or smaller than crop_size. The padding operation is also different from Pad, here we use around padding instead of right-bottom padding.

The relation between output image (padding image) and original image:

                output image

       +----------------------------+
       |          padded area       |
+------|----------------------------|----------+
|      |         cropped area       |          |
|      |         +---------------+  |          |
|      |         |    .   center |  |          | original image
|      |         |        range  |  |          |
|      |         +---------------+  |          |
+------|----------------------------|----------+
       |          padded area       |
       +----------------------------+

There are 5 main areas in the figure:

  • output image: output image of this operation, also called padding image in following instruction.

  • original image: input image of this operation.

  • padded area: non-intersect area of output image and original image.

  • cropped area: the overlap of output image and original image.

  • center range: a smaller area where random center chosen from. center range is computed by border and original image’s shape to avoid our random center is too close to original image’s border.

Also this operation act differently in train and test mode, the summary pipeline is listed below.

Train pipeline:

  1. Choose a random_ratio from ratios, the shape of padding image will be random_ratio * crop_size.

  2. Choose a random_center in center range.

  3. Generate padding image with center matches the random_center.

  4. Initialize the padding image with pixel value equals to mean.

  5. Copy the cropped area to padding image.

  6. Refine annotations.

Test pipeline:

  1. Compute output shape according to test_pad_mode.

  2. Generate padding image with center matches the original image center.

  3. Initialize the padding image with pixel value equals to mean.

  4. Copy the cropped area to padding image.

Required Keys:

  • img (np.float32)

  • img_shape (tuple)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img (np.float32)

  • img_shape (tuple)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

参数
  • crop_size (tuple, optional) – expected size after crop, final size will computed according to ratio. Requires (width, height) in train mode, and None in test mode.

  • ratios (tuple, optional) – random select a ratio from tuple and crop image to (crop_size[0] * ratio) * (crop_size[1] * ratio). Only available in train mode. Defaults to (0.9, 1.0, 1.1).

  • border (int, optional) – max distance from center select area to image border. Only available in train mode. Defaults to 128.

  • mean (sequence, optional) – Mean values of 3 channels.

  • std (sequence, optional) – Std values of 3 channels.

  • to_rgb (bool, optional) – Whether to convert the image from BGR to RGB.

  • test_mode (bool) – whether involve random variables in transform. In train mode, crop_size is fixed, center coords and ratio is random selected from predefined lists. In test mode, crop_size is image’s original shape, center coords and ratio is fixed. Defaults to False.

  • test_pad_mode (tuple, optional) –

    padding method and padding shape value, only available in test mode. Default is using ‘logical_or’ with 127 as padding shape value.

    • ’logical_or’: final_shape = input_shape | padding_shape_value

    • ’size_divisor’: final_shape = int( ceil(input_shape / padding_shape_value) * padding_shape_value)

    Defaults to (‘logical_or’, 127).

  • test_pad_add_pix (int) – Extra padding pixel in test mode. Defaults to 0.

  • bbox_clip_border (bool) – Whether clip the objects outside the border of the image. Defaults to True.

class mmdet.datasets.transforms.RandomCrop(crop_size: tuple, crop_type: str = 'absolute', allow_negative_crop: bool = False, recompute_bbox: bool = False, bbox_clip_border: bool = True)[源代码]

Random crop the image & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_masks (optional)

  • gt_ignore_flags (optional)

  • gt_seg_map (optional)

Added Keys:

  • homography_matrix

参数
  • crop_size (tuple) – The relative ratio or absolute pixels of (width, height).

  • crop_type (str, optional) – One of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Defaults to “absolute”.

  • allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Defaults to False.

  • recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Defaults to False.

  • bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

注解

  • If the image is smaller than the absolute crop size, return the

    original image.

  • The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.

  • If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

class mmdet.datasets.transforms.RandomErasing(n_patches: Union[int, Tuple[int, int]], ratio: Union[float, Tuple[float, float]], squared: bool = True, bbox_erased_thr: float = 0.9, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255)[源代码]

RandomErasing operation.

Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. RandomErasing.

Required Keys:

  • img

  • gt_bboxes (HorizontalBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (BitmapMasks) (optional)

Modified Keys: - img - gt_bboxes (optional) - gt_bboxes_labels (optional) - gt_ignore_flags (optional) - gt_masks (optional)

参数
  • n_patches (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a tuple, number of patches will be randomly selected from the closed interval [n_patches[0], n_patches[1]].

  • ratio (float or tuple[float, float]) – The ratio of erased regions. It can be float to use a fixed ratio or tuple[float, float] to randomly choose ratio from the interval.

  • squared (bool) – Whether to erase square region. Defaults to True.

  • bbox_erased_thr (float) – The threshold for the maximum area proportion of the bbox to be erased. When the proportion of the area where the bbox is erased is greater than the threshold, the bbox will be removed. Defaults to 0.9.

  • img_border_value (int or float or tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

class mmdet.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[源代码]

Flip the image & bbox & mask & segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, and gt_seg_map. There are 3 flip modes:

  • prob is float, direction is string: the image will be

    direction``ly flipped with probability of ``prob . E.g., prob=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

  • prob is float, direction is list of string: the image will

    be direction[i]``ly flipped with probability of ``prob/len(direction). E.g., prob=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.

  • prob is list of float, direction is list of string:

    given len(prob) == len(direction), the image will be direction[i]``ly flipped with probability of ``prob[i]. E.g., prob=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • flip

  • flip_direction

  • homography_matrix

参数
  • prob (float | list[float], optional) – The flipping probability. Defaults to None.

  • direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal prob. Each element in prob indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.

class mmdet.datasets.transforms.RandomOrder(transforms: Union[Dict, Callable[[Dict], Dict], Sequence[Union[Dict, Callable[[Dict], Dict]]]])[源代码]

Shuffle the transform Sequence.

transform(results: Dict)Optional[Dict][源代码]

Transform function to apply transforms in random order.

参数

results (dict) – A result dict contains the results to transform.

返回

Transformed results.

返回类型

dict or None

class mmdet.datasets.transforms.RandomShift(prob: float = 0.5, max_shift_px: int = 32, filter_thr_px: int = 1)[源代码]

Shift the image and box given shift pixels and probability.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_bboxes_labels

  • gt_ignore_flags (bool) (optional)

参数
  • prob (float) – Probability of shifts. Defaults to 0.5.

  • max_shift_px (int) – The max pixels for shifting. Defaults to 32.

  • filter_thr_px (int) – The width and height threshold for filtering. The bbox and the rest of the targets below the width and height threshold will be filtered. Defaults to 1.

class mmdet.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[源代码]

Resize images & bbox & seg.

This transform resizes the input image according to scale or scale_factor. Bboxes, masks, and seg map are then resized with the same scale factor. if scale and scale_factor are both set, it will use scale to resize.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

  • homography_matrix

参数
  • scale (int or tuple) – Images scales for resizing. Defaults to None

  • scale_factor (float or tuple[float]) – Scale factors for resizing. Defaults to None.

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Rotate(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Rotate the images, bboxes, masks and segmentation map.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The maximum angle for rotation. Defaults to 0.0.

  • max_mag (float) – The maximum angle for rotation. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the rotation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.SegRescale(scale_factor: float = 1, backend: str = 'cv2')[源代码]

Rescale semantic segmentation maps.

This transform rescale the gt_seg_map according to scale_factor.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

参数
  • scale_factor (float) – The scale factor of the final output. Defaults to 1.

  • backend (str) – Image rescale backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

transform(results: dict)dict[源代码]

Transform function to scale the semantic segmentation map.

参数

results (dict) – Result dict from loading pipeline.

返回

Result dict with semantic segmentation map scaled.

返回类型

dict

class mmdet.datasets.transforms.Sharpness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[源代码]

Adjust images sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image blurry. A magnitude=0 gives the origin img.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Sharpness transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Sharpness transformation. Defaults to 0.1.

  • max_mag (float) – The maximum magnitude for Sharpness transformation. Defaults to 1.9.

class mmdet.datasets.transforms.ShearX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Shear the images, bboxes, masks and segmentation map horizontally.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for performing Shear and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum angle for the horizontal shear. Defaults to 0.0.

  • max_mag (float) – The maximum angle for the horizontal shear. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the horizontal shear magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.ShearY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Shear the images, bboxes, masks and segmentation map vertically.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for performing ShearY and should be in range [0, 1]. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum angle for the vertical shear. Defaults to 0.0.

  • max_mag (float) – The maximum angle for the vertical shear. Defaults to 30.0.

  • reversal_prob (float) – The probability that reverses the vertical shear magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Solarize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 256.0)[源代码]

Solarize images (Invert all pixels above a threshold value of magnitude.).

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing Solarize transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for Solarize transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for Solarize transformation. Defaults to 256.0.

class mmdet.datasets.transforms.SolarizeAdd(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 110.0)[源代码]

SolarizeAdd images. For each pixel in the image that is less than 128, add an additional amount to it decided by the magnitude.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • prob (float) – The probability for performing SolarizeAdd transformation. Defaults to 1.0.

  • level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum magnitude for SolarizeAdd transformation. Defaults to 0.0.

  • max_mag (float) – The maximum magnitude for SolarizeAdd transformation. Defaults to 110.0.

class mmdet.datasets.transforms.ToTensor(keys)[源代码]

Convert some results to torch.Tensor by given keys.

参数

keys (Sequence[str]) – Keys that need to be converted to Tensor.

class mmdet.datasets.transforms.TranslateX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Translate the images, bboxes, masks and segmentation map horizontally.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum pixel’s offset ratio for horizontal translation. Defaults to 0.0.

  • max_mag (float) – The maximum pixel’s offset ratio for horizontal translation. Defaults to 0.1.

  • reversal_prob (float) – The probability that reverses the horizontal translation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.TranslateY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[源代码]

Translate the images, bboxes, masks and segmentation map vertically.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_masks (BitmapMasks | PolygonMasks) (optional)

  • gt_seg_map (np.uint8) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_masks

  • gt_seg_map

Added Keys:

  • homography_matrix

参数
  • prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.

  • level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.

  • min_mag (float) – The minimum pixel’s offset ratio for vertical translation. Defaults to 0.0.

  • max_mag (float) – The maximum pixel’s offset ratio for vertical translation. Defaults to 0.1.

  • reversal_prob (float) – The probability that reverses the vertical translation magnitude. Should be in range [0,1]. Defaults to 0.5.

  • img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.

  • mask_border_value (int) – The fill value used for masks. Defaults to 0.

  • seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals ignore_label in semantic_head of the corresponding config. Defaults to 255.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmdet.datasets.transforms.Transpose(keys, order)[源代码]

Transpose some results by given keys.

参数
  • keys (Sequence[str]) – Keys of results to be transposed.

  • order (Sequence[int]) – Order of transpose.

class mmdet.datasets.transforms.YOLOXHSVRandomAug(hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30)[源代码]

Apply HSV augmentation to image sequentially. It is referenced from https://github.com/Megvii- BaseDetection/YOLOX/blob/main/yolox/data/data_augment.py#L21.

Required Keys:

  • img

Modified Keys:

  • img

参数
  • hue_delta (int) – delta of hue. Defaults to 5.

  • saturation_delta (int) – delta of saturation. Defaults to 30.

  • value_delta (int) – delat of value. Defaults to 30.

transform(results: dict)dict[源代码]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

参数

results (dict) – The result dict.

返回

The result dict.

返回类型

dict

mmdet.engine

hooks

class mmdet.engine.hooks.CheckInvalidLossHook(interval: int = 50)[源代码]

Check invalid loss hook.

This hook will regularly check whether the loss is valid during training.

参数

interval (int) – Checking interval (every k iterations). Default: 50.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[源代码]

Regularly check whether the loss is valid every n iterations.

参数
  • runner (Runner) – The runner of the training process.

  • batch_idx (int) – The index of the current batch in the train loop.

  • data_batch (dict, Optional) – Data from dataloader. Defaults to None.

  • outputs (dict, Optional) – Outputs from model. Defaults to None.

class mmdet.engine.hooks.DetVisualizationHook(draw: bool = False, interval: int = 50, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, test_out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[源代码]

Detection Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

  1. If show is True, it means that only the prediction results are

    visualized without storing data, so vis_backends needs to be excluded.

  2. If test_out_dir is specified, it means that the prediction results

    need to be saved to test_out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.

  3. vis_backends takes effect if the user does not specify show

    and test_out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

参数
  • draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.

  • interval (int) – The interval of visualization. Defaults to 50.

  • score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • test_out_dir (str, optional) – directory where painted images will be saved in testing process.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[源代码]

Run after every testing iterations.

参数
  • runner (Runner) – The runner of the testing process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[DetDataSample]) – A batch of data samples that contain annotations and predictions.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[源代码]

Run after every self.interval validation iterations.

参数
  • runner (Runner) – The runner of the validation process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict) – Data from dataloader.

  • outputs (Sequence[DetDataSample]]) – A batch of data samples that contain annotations and predictions.

class mmdet.engine.hooks.MeanTeacherHook(momentum: float = 0.001, interval: int = 1, skip_buffer=True)[源代码]

Mean Teacher Hook.

Mean Teacher is an efficient semi-supervised learning method in Mean Teacher. This method requires two models with exactly the same structure, as the student model and the teacher model, respectively. The student model updates the parameters through gradient descent, and the teacher model updates the parameters through exponential moving average of the student model. Compared with the student model, the teacher model is smoother and accumulates more knowledge.

参数
  • momentum (float) –

    The momentum used for updating teacher’s parameter.

    Teacher’s parameter are updated with the formula:

    teacher = (1-momentum) * teacher + momentum * student.

    Defaults to 0.001.

  • interval (int) – Update teacher’s parameter every interval iteration. Defaults to 1.

  • skip_buffers (bool) – Whether to skip the model buffers, such as batchnorm running stats (running_mean, running_var), it does not perform the ema operation. Default to True.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[源代码]

Update teacher’s parameter every self.interval iterations.

before_train(runner: mmengine.runner.runner.Runner)None[源代码]

To check that teacher model and student model exist.

momentum_update(model: torch.nn.modules.module.Module, momentum: float)None[源代码]

Compute the moving average of the parameters using exponential moving average.

class mmdet.engine.hooks.MemoryProfilerHook(interval: int = 50)[源代码]

Memory profiler hook recording memory information including virtual memory, swap memory, and the memory of the current process.

参数

interval (int) – Checking interval (every k iterations). Default: 50.

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None)None[源代码]

Regularly record memory information.

参数
  • runner (Runner) – The runner of the testing process.

  • batch_idx (int) – The index of the current batch in the test loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (Sequence[DetDataSample], optional) – Outputs from model. Defaults to None.

after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None)None[源代码]

Regularly record memory information.

参数
  • runner (Runner) – The runner of the training process.

  • batch_idx (int) – The index of the current batch in the train loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (dict, optional) – Outputs from model. Defaults to None.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None)None[源代码]

Regularly record memory information.

参数
  • runner (Runner) – The runner of the validation process.

  • batch_idx (int) – The index of the current batch in the val loop.

  • data_batch (dict, optional) – Data from dataloader. Defaults to None.

  • outputs (Sequence[DetDataSample], optional) – Outputs from model. Defaults to None.

class mmdet.engine.hooks.NumClassCheckHook[源代码]

Check whether the num_classes in head matches the length of classes in dataset.metainfo.

before_train_epoch(runner: mmengine.runner.runner.Runner)None[源代码]

Check whether the training dataset is compatible with head.

参数

runner (Runner) – The runner of the training or evaluation process.

before_val_epoch(runner: mmengine.runner.runner.Runner)None[源代码]

Check whether the dataset in val epoch is compatible with head.

参数

runner (Runner) – The runner of the training or evaluation process.

class mmdet.engine.hooks.PipelineSwitchHook(switch_epoch, switch_pipeline)[源代码]

Switch data pipeline at switch_epoch.

参数
  • switch_epoch (int) – switch pipeline at this epoch.

  • switch_pipeline (list[dict]) – the pipeline to switch to.

before_train_epoch(runner)[源代码]

switch pipeline.

class mmdet.engine.hooks.SetEpochInfoHook[源代码]

Set runner’s epoch information to the model.

before_train_epoch(runner)[源代码]

All subclasses should override this method, if they need any operations before each training epoch.

参数

runner (Runner) – The runner of the training process.

class mmdet.engine.hooks.SyncNormHook[源代码]

Synchronize Norm states before validation, currently used in YOLOX.

before_val_epoch(runner)[源代码]

Synchronizing norm.

class mmdet.engine.hooks.YOLOXModeSwitchHook(num_last_epochs: int = 15, skip_type_keys: Sequence[str] = ('Mosaic', 'RandomAffine', 'MixUp'))[源代码]

Switch the mode of YOLOX during training.

This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.

参数

num_last_epochs – The number of latter epochs in the end of the training to close the data augmentation and switch to L1 loss. Defaults to 15.

before_train_epoch(runner)None[源代码]

Close mosaic and mixup augmentation and switches to use L1 loss.

optimizers

class mmdet.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[源代码]
add_params(params: List[dict], module: torch.nn.modules.module.Module, **kwargs)None[源代码]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

参数
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

runner

class mmdet.engine.runner.TeacherStudentValLoop(runner, dataloader: Union[torch.utils.data.dataloader.DataLoader, Dict], evaluator: Union[mmengine.evaluator.evaluator.Evaluator, Dict, List], fp16: bool = False)[源代码]

Loop for validation of model teacher and student.

run()[源代码]

Launch validation for model teacher and student.

schedulers

class mmdet.engine.schedulers.QuadraticWarmupLR(optimizer, *args, **kwargs)[源代码]

Warm up the learning rate of each parameter group by quadratic formula.

参数
  • optimizer (Optimizer) – Wrapped optimizer.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

class mmdet.engine.schedulers.QuadraticWarmupMomentum(optimizer, *args, **kwargs)[源代码]

Warm up the momentum value of each parameter group by quadratic formula.

参数
  • optimizer (Optimizer) – Wrapped optimizer.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

class mmdet.engine.schedulers.QuadraticWarmupParamScheduler(optimizer: torch.optim.optimizer.Optimizer, param_name: str, begin: int = 0, end: int = 1000000000, last_step: int = - 1, by_epoch: bool = True, verbose: bool = False)[源代码]

Warm up the parameter value of each parameter group by quadratic formula:

\[X_{t} = X_{t-1} + \frac{2t+1}{{(end-begin)}^{2}} \times X_{base}\]
参数
  • optimizer (Optimizer) – Wrapped optimizer.

  • param_name (str) – Name of the parameter to be adjusted, such as lr, momentum.

  • begin (int) – Step at which to start updating the parameters. Defaults to 0.

  • end (int) – Step at which to stop updating the parameters. Defaults to INF.

  • last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.

  • by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.

  • verbose (bool) – Whether to print the value for each update. Defaults to False.

classmethod build_iter_from_epoch(*args, begin=0, end=1000000000, by_epoch=True, epoch_length=None, **kwargs)[源代码]

Build an iter-based instance of this scheduler from an epoch-based config.

mmdet.evaluation

functional

mmdet.evaluation.functional.average_precision(recalls, precisions, mode='area')[源代码]

Calculate average precision (for single or multiple scales).

参数
  • recalls (ndarray) – shape (num_scales, num_dets) or (num_dets, )

  • precisions (ndarray) – shape (num_scales, num_dets) or (num_dets, )

  • mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]

返回

calculated average precision

返回类型

float or ndarray

mmdet.evaluation.functional.bbox_overlaps(bboxes1, bboxes2, mode='iou', eps=1e-06, use_legacy_coordinate=False)[源代码]

Calculate the ious between each bbox of bboxes1 and bboxes2.

参数
  • bboxes1 (ndarray) – Shape (n, 4)

  • bboxes2 (ndarray) – Shape (k, 4)

  • mode (str) – IOU (intersection over union) or IOF (intersection over foreground)

  • use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Note when function is used in VOCDataset, it should be True to align with the official implementation http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCdevkit_18-May-2011.tar Default: False.

返回

Shape (n, k)

返回类型

ious (ndarray)

mmdet.evaluation.functional.eval_map(det_results, annotations, scale_ranges=None, iou_thr=0.5, ioa_thr=None, dataset=None, logger=None, tpfp_fn=None, nproc=4, use_legacy_coordinate=False, use_group_of=False, eval_mode='area')[源代码]

Evaluate mAP of a dataset.

参数
  • det_results (list[list]) – [[cls1_det, cls2_det, …], …]. The outer list indicates images, and the inner list indicates per-class detected bboxes.

  • annotations (list[dict]) –

    Ground truth annotations where each item of the list indicates an image. Keys of annotations are:

    • bboxes: numpy array of shape (n, 4)

    • labels: numpy array of shape (n, )

    • bboxes_ignore (optional): numpy array of shape (k, 4)

    • labels_ignore (optional): numpy array of shape (k, )

  • scale_ranges (list[tuple] | None) – Range of scales to be evaluated, in the format [(min1, max1), (min2, max2), …]. A range of (32, 64) means the area range between (32**2, 64**2). Defaults to None.

  • iou_thr (float) – IoU threshold to be considered as matched. Defaults to 0.5.

  • ioa_thr (float | None) – IoA threshold to be considered as matched, which only used in OpenImages evaluation. Defaults to None.

  • dataset (list[str] | str | None) – Dataset name or dataset classes, there are minor differences in metrics for different datasets, e.g. “voc”, “imagenet_det”, etc. Defaults to None.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.

  • tpfp_fn (callable | None) – The function used to determine true/ false positives. If None, tpfp_default() is used as default unless dataset is ‘det’ or ‘vid’ (tpfp_imagenet() in this case). If it is given as a function, then this function is used to evaluate tp & fp. Default None.

  • nproc (int) – Processes used for computing TP and FP. Defaults to 4.

  • use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Defaults to False.

  • use_group_of (bool) – Whether to use group of when calculate TP and FP, which only used in OpenImages evaluation. Defaults to False.

  • eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1], PASCAL VOC2007 uses 11points as default evaluate mode, while others are ‘area’. Defaults to ‘area’.

返回

(mAP, [dict, dict, …])

返回类型

tuple

mmdet.evaluation.functional.eval_recalls(gts, proposals, proposal_nums=None, iou_thrs=0.5, logger=None, use_legacy_coordinate=False)[源代码]

Calculate recalls.

参数
  • gts (list[ndarray]) – a list of arrays of shape (n, 4)

  • proposals (list[ndarray]) – a list of arrays of shape (k, 4) or (k, 5)

  • proposal_nums (int | Sequence[int]) – Top N proposals to be evaluated.

  • iou_thrs (float | Sequence[float]) – IoU thresholds. Default: 0.5.

  • logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.

  • use_legacy_coordinate (bool) – Whether use coordinate system in mmdet v1.x. “1” was added to both height and width which means w, h should be computed as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’. Default: False.

返回

recalls of different ious and proposal nums

返回类型

ndarray

mmdet.evaluation.functional.get_classes(dataset)[源代码]

Get class names of a dataset.

mmdet.evaluation.functional.plot_iou_recall(recalls, iou_thrs)[源代码]

Plot IoU-Recalls curve.

参数
  • recalls (ndarray or list) – shape (k,)

  • iou_thrs (ndarray or list) – same shape as recalls

mmdet.evaluation.functional.plot_num_recall(recalls, proposal_nums)[源代码]

Plot Proposal_num-Recalls curve.

参数
  • recalls (ndarray or list) – shape (k,)

  • proposal_nums (ndarray or list) – same shape as recalls

mmdet.evaluation.functional.pq_compute_multi_core(matched_annotations_list, gt_folder, pred_folder, categories, file_client=None, nproc=32)[源代码]

Evaluate the metrics of Panoptic Segmentation with multithreading.

Same as the function with the same name in panopticapi.

参数
  • matched_annotations_list (list) – The matched annotation list. Each element is a tuple of annotations of the same image with the format (gt_anns, pred_anns).

  • gt_folder (str) – The path of the ground truth images.

  • pred_folder (str) – The path of the prediction images.

  • categories (str) – The categories of the dataset.

  • file_client (object) – The file client of the dataset. If None, the backend will be set to disk.

  • nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When nproc exceeds the number of cpu cores, the number of cpu cores is used.

mmdet.evaluation.functional.pq_compute_single_core(proc_id, annotation_set, gt_folder, pred_folder, categories, file_client=None, print_log=False)[源代码]

The single core function to evaluate the metric of Panoptic Segmentation.

Same as the function with the same name in panopticapi. Only the function to load the images is changed to use the file client.

参数
  • proc_id (int) – The id of the mini process.

  • gt_folder (str) – The path of the ground truth images.

  • pred_folder (str) – The path of the prediction images.

  • categories (str) – The categories of the dataset.

  • file_client (object) – The file client of the dataset. If None, the backend will be set to disk.

  • print_log (bool) – Whether to print the log. Defaults to False.

mmdet.evaluation.functional.print_map_summary(mean_ap, results, dataset=None, scale_ranges=None, logger=None)[源代码]

Print mAP and results of each class.

A table will be printed to show the gts/dets/recall/AP of each class and the mAP.

参数
  • mean_ap (float) – Calculated from eval_map().

  • results (list[dict]) – Calculated from eval_map().

  • dataset (list[str] | str | None) – Dataset name or dataset classes.

  • scale_ranges (list[tuple] | None) – Range of scales to be evaluated.

  • logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.

mmdet.evaluation.functional.print_recall_summary(recalls, proposal_nums, iou_thrs, row_idxs=None, col_idxs=None, logger=None)[源代码]

Print recalls in a table.

参数
  • recalls (ndarray) – calculated from bbox_recalls

  • proposal_nums (ndarray or list) – top N proposals

  • iou_thrs (ndarray or list) – iou thresholds

  • row_idxs (ndarray) – which rows(proposal nums) to print

  • col_idxs (ndarray) – which cols(iou thresholds) to print

  • logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.

metrics

class mmdet.evaluation.metrics.CityScapesMetric(outfile_prefix: str, seg_prefix: Optional[str] = None, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

CityScapes metric for instance segmentation.

参数
  • outfile_prefix (str) – The prefix of txt and png files. The txt and png file will be save in a directory whose path is “outfile_prefix.results/”.

  • seg_prefix (str, optional) – Path to the directory which contains the cityscapes instance segmentation masks. It’s necessary when training and validation. It could be None when infer on test dataset. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • keep_results (bool) – Whether to keep the results. When format_only is True, keep_results must be True. Defaults to False.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results.

返回类型

Dict[str, float]

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.CocoMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

COCO evaluation metric.

Evaluate AR, AP, and mAP for detection tasks including proposal/box detection and instance segmentation. Please refer to https://cocodataset.org/#detection-eval for more details.

参数
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).

  • iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.

  • metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

返回类型

Dict[str, float]

fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None)numpy.ndarray[源代码]

Evaluate proposal recall with COCO’s fast_eval_recall.

参数
  • results (List[dict]) – Results of the dataset.

  • proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.

  • iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.

  • logger (MMLogger, optional) – Logger used for logging the recall summary.

返回

Averaged recall results.

返回类型

np.ndarray

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str)str[源代码]

Convert ground truth to coco format json file.

参数
  • gt_dicts (Sequence[dict]) – Ground truth of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

返回

The filename of the json file.

返回类型

str

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

results2json(results: Sequence[dict], outfile_prefix: str)dict[源代码]

Dump the detection results to a COCO style json file.

There are 3 types of results: proposals, bbox predictions, mask predictions, and they have different data types. This method will automatically recognize the type, and dump them to json files.

参数
  • results (Sequence[dict]) – Testing results of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.bbox.json”, “somepath/xxx.segm.json”, “somepath/xxx.proposal.json”.

返回

Possible keys are “bbox”, “segm”, “proposal”, and values are corresponding filenames.

返回类型

dict

xyxy2xywh(bbox: numpy.ndarray)list[源代码]

Convert xyxy style bounding boxes to xywh style for COCO evaluation.

参数

bbox (numpy.ndarray) – The bounding boxes, shape (4, ), in xyxy order.

返回

The converted bounding boxes, in xywh order.

返回类型

list[float]

class mmdet.evaluation.metrics.CocoPanopticMetric(ann_file: Optional[str] = None, seg_prefix: Optional[str] = None, classwise: bool = False, format_only: bool = False, outfile_prefix: Optional[str] = None, nproc: int = 32, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

COCO panoptic segmentation evaluation metric.

Evaluate PQ, SQ RQ for panoptic segmentation tasks. Please refer to https://cocodataset.org/#panoptic-eval for more details.

参数
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • seg_prefix (str, optional) – Path to the directory which contains the coco panoptic segmentation mask. It should be specified when evaluate. Defaults to None.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. It should be specified when format_only is True. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When nproc exceeds the number of cpu cores, the number of cpu cores is used.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][源代码]

Compute the metrics from processed results.

参数

results (list) –

The processed results of each batch. There are two cases:

  • When outfile_prefix is not provided, the elements in results are pq_stats which can be summed directly to get PQ.

  • When outfile_prefix is provided, the elements in results are tuples like (gt, pred).

返回

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results.

返回类型

Dict[str, float]

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str)Tuple[str, str][源代码]

Convert ground truth to coco panoptic segmentation format json file.

参数
  • gt_dicts (Sequence[dict]) – Ground truth of the dataset.

  • outfile_prefix (str) – The filename prefix of the json file. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

返回

The filename of the json file and the name of the directory which contains panoptic segmentation masks.

返回类型

Tuple[str, str]

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

result2json(results: Sequence[dict], outfile_prefix: str)Tuple[str, str][源代码]

Dump the panoptic results to a COCO style json file and a directory.

参数
  • results (Sequence[dict]) – Testing results of the dataset.

  • outfile_prefix (str) – The filename prefix of the json files and the directory.

返回

The json file and the directory which contains panoptic segmentation masks. The filename of the json is

”somepath/xxx.panoptic.json” and name of the directory is “somepath/xxx.panoptic”.

返回类型

Tuple[str, str]

class mmdet.evaluation.metrics.CrowdHumanMetric(ann_file: str, metric: Union[str, List[str]] = ['AP', 'MR', 'JI'], format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None, eval_mode: int = 0, iou_thres: float = 0.5, compare_matching_method: Optional[str] = None, mr_ref: str = 'CALTECH_-2', num_ji_process: int = 10)[源代码]

CrowdHuman evaluation metric.

Evaluate Average Precision (AP), Miss Rate (MR) and Jaccard Index (JI) for detection tasks.

参数
  • ann_file (str) – Path to the annotation file.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘AP’, ‘MR’ and ‘JI’. Defaults to ‘AP’.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

  • eval_mode (int) – Select the mode of evaluate. Valid mode include 0(just body box), 1(just head box) and 2(both of them). Defaults to 0.

  • iou_thres (float) – IoU threshold. Defaults to 0.5.

  • compare_matching_method (str, optional) – Matching method to compare the detection results with the ground_truth when compute ‘AP’ and ‘MR’.Valid method include VOC and None(CALTECH). Default to None.

  • mr_ref (str) – Different parameter selection to calculate MR. Valid ref include CALTECH_-2 and CALTECH_-4. Defaults to CALTECH_-2.

  • num_ji_process (int) – The number of processes to evaluation JI. Defaults to 10.

compare(samples)[源代码]

Match the detection results with the ground_truth.

参数

samples (dict[Image]) – The detection result packaged by Image.

返回

Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

返回类型

score_list(list[tuple[ndarray, int, str]])

compute_ji_matching(dt_boxes, gt_boxes)[源代码]

Match the annotation box for each detection box.

参数
  • dt_boxes (ndarray) – Detection boxes.

  • gt_boxes (ndarray) – Ground_truth boxes.

返回

Match result.

返回类型

matches_(list[tuple[int, int]])

compute_ji_with_ignore(result_queue, dt_result, score_thr)[源代码]

Compute JI with ignore.

参数
  • result_queue (Queue) – The Queue for save compute result when multi_process.

  • dt_result (dict[Image]) – Detection result packaged by Image.

  • score_thr (float) – The threshold of detection score.

返回

compute result.

返回类型

dict

compute_metrics(results: list)Dict[str, float][源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

返回类型

eval_results(Dict[str, float])

static eval_ap(score_list, gt_num, img_num)[源代码]

Evaluate by average precision.

参数
  • score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

  • gt_num (int) – The number of gt boxes in the entire dataset.

  • img_num (int) –

返回

result of average precision.

返回类型

ap(float)

eval_ji(samples)[源代码]

Evaluate by JI using multi_process.

参数

samples (Dict[str, Image]) – The detection result packaged by Image.

返回

result of jaccard index.

返回类型

ji(float)

eval_mr(score_list, gt_num, img_num)[源代码]

Evaluate by Caltech-style log-average miss rate.

参数
  • score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.

  • gt_num (int) – The number of gt boxes in the entire dataset.

  • img_num (int) – The number of image in the entire dataset.

返回

result of miss rate.

返回类型

mr(float)

static gather(results)[源代码]

Integrate test results.

get_ignores(dt_boxes, gt_boxes)[源代码]

Get the number of ignore bboxes.

load_eval_samples(result_file)[源代码]

Load data from annotations file and detection results.

参数

result_file (str) – The file path of the saved detection results.

返回

The detection result packaged by Image

返回类型

Dict[Image]

process(data_batch: Sequence[dict], data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

static results2json(results: Sequence[dict], outfile_prefix: str)str[源代码]

Dump the detection results to a json file.

class mmdet.evaluation.metrics.DumpProposals(output_dir: str = '', proposals_file: str = 'proposals.pkl', num_max_proposals: Optional[int] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

Dump proposals pseudo metric.

参数
  • output_dir (str) – The root directory for proposals_file. Defaults to ‘’.

  • proposals_file (str) – Proposals file path. Defaults to ‘proposals.pkl’.

  • num_max_proposals (int, optional) – Maximum number of proposals to dump. If not specified, all proposals will be dumped.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmengine.fileio.FileClient for details. Defaults to dict(backend='disk').

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[源代码]

Dump the processed results.

参数

results (list) – The processed results of each batch.

返回

An empty dict.

返回类型

dict

process(data_batch: Sequence[dict], data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.LVISMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

LVIS evaluation metric.

参数
  • ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.

  • metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.

  • classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.

  • proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).

  • iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.

  • metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.

  • format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

返回类型

Dict[str, float]

fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None)numpy.ndarray[源代码]

Evaluate proposal recall with LVIS’s fast_eval_recall.

参数
  • results (List[dict]) – Results of the dataset.

  • proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.

  • iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.

  • logger (MMLogger, optional) – Logger used for logging the recall summary.

返回

Averaged recall results.

返回类型

np.ndarray

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.OpenImagesMetric(iou_thrs: Union[float, List[float]] = 0.5, ioa_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, use_group_of: bool = True, get_supercategory: bool = True, filter_labels: bool = True, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

OpenImages evaluation metric.

Evaluate detection mAP for OpenImages. Please refer to https://storage.googleapis.com/openimages/web/evaluation.html for more details.

参数
  • iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.

  • ioa_thrs (float or List[float]) – IoA threshold. Defaults to 0.5.

  • scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None

  • use_group_of (bool) – Whether consider group of groud truth bboxes during evaluating. Defaults to True.

  • get_supercategory (bool) – Whether to get parent class of the current class. Default: True.

  • filter_labels (bool) – Whether filter unannotated classes. Default: True.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

返回类型

dict

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

class mmdet.evaluation.metrics.VOCMetric(iou_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, metric: Union[str, List[str]] = 'mAP', proposal_nums: Sequence[int] = (100, 300, 1000), eval_mode: str = '11points', collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]

Pascal VOC evaluation metric.

参数
  • iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.

  • scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None.

  • metric (str | list[str]) –

    Metrics to be evaluated. Options are ‘mAP’, ‘recall’. If is list, the first setting in the list will

    be used to evaluate metric.

  • proposal_nums (Sequence[int]) – Proposal number used for evaluating recalls, such as recall@100, recall@1000. Default: (100, 300, 1000).

  • eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]. The PASCAL VOC2007 defaults to use ‘11points’, while PASCAL VOC2012 defaults to use ‘area’.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)dict[源代码]

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results.

返回类型

dict

process(data_batch: dict, data_samples: Sequence[dict])None[源代码]

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.

mmdet.models

backbones

class mmdet.models.backbones.CSPDarknet(arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), frozen_stages=- 1, use_depthwise=False, arch_ovewrite=None, spp_kernal_sizes=(5, 9, 13), conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, init_cfg={'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]

CSP-Darknet backbone used in YOLOv5 and YOLOX.

参数
  • arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Default: P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Default: 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Default: False.

  • arch_ovewrite (list) – Overwrite default arch settings. Default: None.

  • spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Default: (5, 9, 13).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> from mmdet.models import CSPDarknet
>>> import torch
>>> self = CSPDarknet(depth=53)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmdet.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, spp_kernel_sizes: Sequence[int] = (5, 9, 13), channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]

CSPNeXt backbone used in RTMDet.

参数
  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • spp_kernel_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).

  • channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

forward(x: Tuple[torch.Tensor, ...])Tuple[torch.Tensor, ...][源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)None[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmdet.models.backbones.Darknet(depth=53, out_indices=(3, 4, 5), frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'}, norm_eval=True, pretrained=None, init_cfg=None)[源代码]

Darknet backbone.

参数
  • depth (int) – Depth of Darknet. Currently only support 53.

  • out_indices (Sequence[int]) – Output from which stages.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

示例

>>> from mmdet.models import Darknet
>>> import torch
>>> self = Darknet(depth=53)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static make_conv_res_block(in_channels, out_channels, res_repeat, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'})[源代码]

In Darknet backbone, ConvLayer is usually followed by ResBlock. This function will make that. The Conv layers always have 3x3 filters with stride=2. The number of the filters in Conv layer is the same as the out channels of the ResBlock.

参数
  • in_channels (int) – The number of input channels.

  • out_channels (int) – The number of output channels.

  • res_repeat (int) – The number of ResBlocks.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmdet.models.backbones.DetectoRS_ResNeXt(groups=1, base_width=4, **kwargs)[源代码]

ResNeXt backbone for DetectoRS.

参数
  • groups (int) – The number of groups in ResNeXt.

  • base_width (int) – The base width of ResNeXt.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer for DetectoRS.

class mmdet.models.backbones.DetectoRS_ResNet(sac=None, stage_with_sac=(False, False, False, False), rfp_inplanes=None, output_img=False, pretrained=None, init_cfg=None, **kwargs)[源代码]

ResNet backbone for DetectoRS.

参数
  • sac (dict, optional) – Dictionary to construct SAC (Switchable Atrous Convolution). Default: None.

  • stage_with_sac (list) – Which stage to use sac. Default: (False, False, False, False).

  • rfp_inplanes (int, optional) – The number of channels from RFP. Default: None. If specified, an additional conv layer will be added for rfp_feat. Otherwise, the structure is the same as base class.

  • output_img (bool) – If True, the input image will be inserted into the starting position of output. Default: False.

forward(x)[源代码]

Forward function.

init_weights()[源代码]

Initialize the weights.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer for DetectoRS.

rfp_forward(x, rfp_feats)[源代码]

Forward function for RFP.

class mmdet.models.backbones.EfficientNet(arch='b0', drop_path_rate=0.0, out_indices=(6), frozen_stages=0, conv_cfg={'type': 'Conv2dAdaptivePadding'}, norm_cfg={'eps': 0.001, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': 'Conv2d'}, {'type': 'Constant', 'layer': ['_BatchNorm', 'GroupNorm'], 'val': 1}])[源代码]

EfficientNet backbone.

参数
  • arch (str) – Architecture of efficientnet. Defaults to b0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (6, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to 0, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmdet.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[源代码]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions arXiv:.

参数
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules(int): The number of HRModule in this stage.

    • num_branches(int): The number of branches in the HRModule.

    • block(str): The type of convolution block.

    • num_blocks(tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels(tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> from mmdet.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[源代码]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode will keeping the normalization layer freezed.

class mmdet.models.backbones.HourglassNet(downsample_times: int = 5, num_stacks: int = 2, stage_channels: Sequence = (256, 256, 384, 384, 384, 512), stage_blocks: Sequence = (2, 2, 2, 2, 2, 4), feat_channel: int = 256, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'requires_grad': True, 'type': 'BN'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

参数
  • downsample_times (int) – Downsample times in a HourglassModule.

  • num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.

  • stage_channels (Sequence[int]) – Feature channel of each sub-module in a HourglassModule.

  • stage_blocks (Sequence[int]) – Number of sub-modules stacked in a HourglassModule.

  • feat_channel (int) – Feature channel of conv after a HourglassModule.

  • norm_cfg – Dictionary to construct and config norm layer.

示例

>>> from mmdet.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)
forward(x: torch.Tensor)List[torch.Tensor][源代码]

Forward function.

init_weights()None[源代码]

Init module weights.

class mmdet.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(1, 2, 4, 7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[源代码]

MobileNetV2 backbone.

参数
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int], optional) – Output from which stages. Default: (1, 2, 4, 7).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Forward function.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]

Convert the model into training mode while keep normalization layer frozen.

class mmdet.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, convert_weights=True, init_cfg=None)[源代码]

Pyramid Vision Transformer (PVT)

Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

参数
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 64.

  • num_stags (int) – The num of stages. Default: 4.

  • num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].

  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].

  • strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].

  • paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.

  • use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

class mmdet.models.backbones.PyramidVisionTransformerV2(**kwargs)[源代码]

Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.

class mmdet.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[源代码]

RegNet backbone.

More details can be found in paper .

参数
  • arch (dict) –

    The parameter of RegNets.

    • w0 (int): initial width

    • wa (float): slope of width

    • wm (float): quantization parameter to quantize the width

    • depth (int): depth of the backbone

    • group_w (int): width of group

    • bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Default: 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

示例

>>> from mmdet.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
adjust_width_group(widths, bottleneck_ratio, groups)[源代码]

Adjusts the compatibility of widths and groups.

参数
  • widths (list[int]) – Width of each stage.

  • bottleneck_ratio (float) – Bottleneck ratio.

  • groups (int) – number of groups in each stage

返回

The adjusted widths and groups of each stage.

返回类型

tuple(list)

forward(x)[源代码]

Forward function.

generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]

Generates per block width from RegNet parameters.

参数
  • initial_width ([int]) – Initial width of the backbone

  • width_slope ([float]) – Slope of the quantized linear function

  • width_parameter ([int]) – Parameter used to quantize the width.

  • depth ([int]) – Depth of the backbone.

  • divisor (int, optional) – The divisor of channels. Defaults to 8.

返回

return a list of widths of each stage and the number of stages

返回类型

list, int

get_stages_from_blocks(widths)[源代码]

Gets widths/stage_blocks of network at each stage.

参数

widths (list[int]) – Width in each stage.

返回

width and depth of each stage

返回类型

tuple(list)

static quantize_float(number, divisor)[源代码]

Converts a float to closest non-zero int divisible by divisor.

参数
  • number (int) – Original number to be quantized.

  • divisor (int) – Divisor used to quantize the number.

返回

quantized number that is divisible by devisor.

返回类型

int

class mmdet.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, pretrained=None, init_cfg=None, **kwargs)[源代码]

Res2Net backbone.

参数
  • scales (int) – Scales used in Res2Net. Default: 4

  • base_width (int) – Basic width of each scale. Default: 26

  • depth (int) – Depth of res2net, from {50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Res2net stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

示例

>>> from mmdet.models import Res2Net
>>> import torch
>>> self = Res2Net(depth=50, scales=4, base_width=26)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer.

class mmdet.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]

ResNeSt backbone.

参数
  • groups (int) – Number of groups of Bottleneck. Default: 1

  • base_width (int) – Base width of Bottleneck. Default: 4

  • radix (int) – Radix of SplitAttentionConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • kwargs (dict) – Keyword arguments for ResNet.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer.

class mmdet.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[源代码]

ResNeXt backbone.

参数
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer

class mmdet.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[源代码]

ResNet backbone.

参数
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Resnet stages. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

示例

>>> from mmdet.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[源代码]

Forward function.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[源代码]

Make plugins for ResNet stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be:

实际案例

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True),
...          position='after_conv2'),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='1'),
...          stages=(True, True, True, True),
...          position='after_conv3'),
...     dict(cfg=dict(type='zzz', postfix='2'),
...          stages=(True, True, True, True),
...          position='after_conv3')
... ]
>>> self = ResNet(depth=18)
>>> stage_plugins = self.make_stage_plugins(plugins, 0)
>>> assert len(stage_plugins) == 3

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

参数
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

返回

Plugins for current stage

返回类型

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode while keep normalization layer freezed.

class mmdet.models.backbones.ResNetV1d(**kwargs)[源代码]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmdet.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[源代码]

VGG Backbone network for single-shot-detection.

参数
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_last_pool (bool) – Whether to add a pooling layer at the last of the model

  • ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.

  • out_indices (Sequence[int]) – Output from which stages.

  • out_feature_indices (Sequence[int]) – Output from which feature map.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

  • input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.

  • l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.

示例

>>> self = SSDVGG(input_size=300, depth=11)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 300, 300)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 19, 19)
(1, 512, 10, 10)
(1, 256, 5, 5)
(1, 256, 3, 3)
(1, 256, 1, 1)
forward(x)[源代码]

Forward function.

init_weights(pretrained=None)[源代码]

Initialize the weights.

class mmdet.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, convert_weights=False, frozen_stages=- 1, init_cfg=None)[源代码]

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -

Inspiration from https://github.com/microsoft/Swin-Transformer

参数
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

train(mode=True)[源代码]

Convert the model into training mode while keep layers freezed.

class mmdet.models.backbones.TridentResNet(depth, num_branch, test_branch_idx, trident_dilations, **kwargs)[源代码]

The stem layer, stage 1 and stage 2 in Trident ResNet are identical to ResNet, while in stage 3, Trident BottleBlock is utilized to replace the normal BottleBlock to yield trident output. Different branch shares the convolution weight but uses different dilations to achieve multi-scale output.

/ stage3(b0) x - stem - stage1 - stage2 - stage3(b1) - output stage3(b2) /

参数
  • depth (int) – Depth of resnet, from {50, 101, 152}.

  • num_branch (int) – Number of branches in TridentNet.

  • test_branch_idx (int) – In inference, all 3 branches will be used if test_branch_idx==-1, otherwise only branch with index test_branch_idx will be used.

  • trident_dilations (tuple[int]) – Dilations of different trident branch. len(trident_dilations) should be equal to num_branch.

data_preprocessors

class mmdet.models.data_preprocessors.BatchFixedSizePad(size: Tuple[int, int], img_pad_value: int = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255)[源代码]

Fixed size padding for batch images.

参数
  • size (Tuple[int, int]) – Fixed padding size. Expected padding shape (h, w). Defaults to None.

  • img_pad_value (int) – The padded pixel value for images. Defaults to 0.

  • pad_mask (bool) – Whether to pad instance masks. Defaults to False.

  • mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.

  • pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.

  • seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.

forward(inputs: torch.Tensor, data_samples: Optional[List[dict]] = None)Tuple[torch.Tensor, Optional[List[dict]]][源代码]

Pad image, instance masks, segmantic segmentation maps.

class mmdet.models.data_preprocessors.BatchResize(scale: tuple, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0)[源代码]

Batch resize during training. This implementation is modified from https://github.com/Purkialo/CrowdDet/blob/master/lib/data/CrowdHuman.py.

It provides the data pre-processing as follows: - A batch of all images will pad to a uniform size and stack them into

a torch.Tensor by DetDataPreprocessor.

  • BatchFixShapeResize resize all images to the target size.

  • Padding images to make sure the size of image can be divisible by pad_size_divisor.

参数
  • scale (tuple) – Images scales for resizing.

  • pad_size_divisor (int) – Image size divisible factor. Defaults to 1.

  • pad_value (Number) – The padded pixel value. Defaults to 0.

forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample])Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][源代码]

resize a batch of images and bboxes.

get_padded_tensor(tensor: torch.Tensor, pad_value: int)torch.Tensor[源代码]

Pad images according to pad_size_divisor.

get_target_size(height: int, width: int)Tuple[int, int, float][源代码]

Get the target size of a batch of images based on data and scale.

class mmdet.models.data_preprocessors.BatchSyncRandomResize(random_size_range: Tuple[int, int], interval: int = 10, size_divisor: int = 32)[源代码]

Batch random resize which synchronizes the random size across ranks.

参数
  • random_size_range (tuple) – The multi-scale random range during multi-scale training.

  • interval (int) – The iter interval of change image size. Defaults to 10.

  • size_divisor (int) – Image size divisible factor. Defaults to 32.

forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample])Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][源代码]

resize a batch of images and bboxes to shape self._input_size

class mmdet.models.data_preprocessors.BoxInstDataPreprocessor(*arg, mask_stride: int = 4, pairwise_size: int = 3, pairwise_dilation: int = 2, pairwise_color_thresh: float = 0.3, bottom_pixels_removed: int = 10, **kwargs)[源代码]

Pseudo mask pre-processor for BoxInst.

Comparing with the mmdet.DetDataPreprocessor,

  1. It generates masks using box annotations.

  2. It computes the images color similarity in LAB color space.

参数
  • mask_stride (int) – The mask output stride in boxinst. Defaults to 4.

  • pairwise_size (int) – The size of neighborhood for each pixel. Defaults to 3.

  • pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.

  • pairwise_color_thresh (float) – The thresh of image color similarity. Defaults to 0.3.

  • bottom_pixels_removed (int) – The length of removed pixels in bottom. It is caused by the annotation error in coco dataset. Defaults to 10.

forward(data: dict, training: bool = False)dict[源代码]

Get pseudo mask labels using color similarity.

get_images_color_similarity(inputs: torch.Tensor, image_masks: torch.Tensor)torch.Tensor[源代码]

Compute the image color similarity in LAB color space.

class mmdet.models.data_preprocessors.DetDataPreprocessor(mean: Optional[Sequence[numbers.Number]] = None, std: Optional[Sequence[numbers.Number]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, boxtype2tensor: bool = True, batch_augments: Optional[List[dict]] = None)[源代码]

Image pre-processor for detection tasks.

Comparing with the mmengine.ImgDataPreprocessor,

  1. It supports batch augmentations.

2. It will additionally append batch_input_shape and pad_shape to data_samples considering the object detection task.

It provides the data pre-processing as follows

  • Collate and move data to the target device.

  • Pad inputs to the maximum size of current batch with defined pad_value. The padding size can be divisible by a defined pad_size_divisor

  • Stack inputs to batch_inputs.

  • Convert inputs from bgr to rgb if the shape of input is (3, H, W).

  • Normalize image with defined std and mean.

  • Do batch augmentations during training.

参数
  • mean (Sequence[Number], optional) – The pixel mean of R, G, B channels. Defaults to None.

  • std (Sequence[Number], optional) – The pixel standard deviation of R, G, B channels. Defaults to None.

  • pad_size_divisor (int) – The size of padded image should be divisible by pad_size_divisor. Defaults to 1.

  • pad_value (Number) – The padded pixel value. Defaults to 0.

  • pad_mask (bool) – Whether to pad instance masks. Defaults to False.

  • mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.

  • pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.

  • seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.

  • bgr_to_rgb (bool) – whether to convert image from BGR to RGB. Defaults to False.

  • rgb_to_bgr (bool) – whether to convert image from RGB to RGB. Defaults to False.

  • boxtype2tensor (bool) – Whether to keep the BaseBoxes type of bboxes data or not. Defaults to False.

  • batch_augments (list[dict], optional) – Batch-level augmentations

forward(data: dict, training: bool = False)dict[源代码]

Perform normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor.

参数
  • data (dict) – Data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation.

返回

Data in the same format as the model input.

返回类型

dict

pad_gt_masks(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[源代码]

Pad gt_masks to shape of batch_input_shape.

pad_gt_sem_seg(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample])None[源代码]

Pad gt_sem_seg to shape of batch_input_shape.

class mmdet.models.data_preprocessors.MultiBranchDataPreprocessor(data_preprocessor: Union[mmengine.config.config.ConfigDict, dict])[源代码]

DataPreprocessor wrapper for multi-branch data.

Take semi-supervised object detection as an example, assume that the ratio of labeled data and unlabeled data in a batch is 1:2, sup indicates the branch where the labeled data is augmented, unsup_teacher and unsup_student indicate the branches where the unlabeled data is augmented by different pipeline.

The input format of multi-branch data is shown as below :

The format of multi-branch data after filtering None is shown as below :

In order to reuse DetDataPreprocessor for the data from different branches, the format of multi-branch data grouped by branch is as below :

After preprocessing data from different branches, the multi-branch data needs to be reformatted as:

参数

data_preprocessor (ConfigDict or dict) – Config of DetDataPreprocessor to process the input data.

cpu(*args, **kwargs)torch.nn.modules.module.Module[源代码]

Overrides this method to set the device

返回

The model itself.

返回类型

nn.Module

cuda(*args, **kwargs)torch.nn.modules.module.Module[源代码]

Overrides this method to set the device

返回

The model itself.

返回类型

nn.Module

forward(data: dict, training: bool = False)dict[源代码]

Perform normalization、padding and bgr2rgb conversion based on BaseDataPreprocessor for multi-branch data.

参数
  • data (dict) – Data sampled from dataloader.

  • training (bool) – Whether to enable training time augmentation.

返回

  • ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of

    models from different branches.

  • ’data_sample’ (Dict[str, obj:DetDataSample]): The annotation

    info of the sample from different branches.

返回类型

dict

to(device: Optional[Union[int, torch.device]], *args, **kwargs)torch.nn.modules.module.Module[源代码]

Overrides this method to set the device

参数

device (int or torch.device, optional) – The desired device of the parameters and buffers in this module.

返回

The model itself.

返回类型

nn.Module

dense_heads

class mmdet.models.dense_heads.ATSSHead(num_classes: int, in_channels: int, pred_kernel_size: int = 3, stacked_convs: int = 4, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, reg_decoded_bbox: bool = True, loss_centerness: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'atss_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'}, **kwargs)[源代码]

Detection Head of ATSS.

ATSS head structure is similar with FCOS, however ATSS use anchor boxes and assign label by Adaptive Training Sample Selection instead max-iou.

参数
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d

  • stacked_convs (int) – Number of stacking convs of the head.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='GN', num_groups=32, requires_grad=True).

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • loss_centerness (ConfigDict or dict) – Config of centerness loss. Defaults to dict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

centerness_target(anchors: torch.Tensor, gts: torch.Tensor)torch.Tensor[源代码]

Calculate the centerness between anchors and gts.

Only calculate pos centerness targets, otherwise there may be nan.

参数
  • anchors (Tensor) – Anchors with shape (N, 4), “xyxy” format.

  • gts (Tensor) – Ground truth bboxes with shape (N, 4), “xyxy” format.

返回

Centerness between anchors and gts.

返回类型

Tensor

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor]][源代码]

Forward features from the upstream network.

参数

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

返回

Usually a tuple of classification scores and bbox prediction
cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_anchors * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for all scale

levels, each is a 4D-tensor, the channels number is num_anchors * 4.

返回类型

tuple

forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale)Sequence[torch.Tensor][源代码]

Forward feature of a single scale level.

参数
  • x (Tensor) – Features of a single scale level.

  • ( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.

返回

cls_score (Tensor): Cls scores for a single scale level

the channels number is num_anchors * num_classes.

bbox_pred (Tensor): Box energies / deltas for a single scale

level, the channels number is num_anchors * 4.

centerness (Tensor): Centerness for a single scale level, the

channel number is (N, num_anchors * 1, H, W).

返回类型

tuple

get_num_level_anchors_inside(num_level_anchors, inside_flags)[源代码]

Get the number of valid anchors in every level.

get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True)tuple[源代码]

Get targets for ATSS head.

This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], centernesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

参数
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)

  • centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

loss_by_feat_single(anchors: torch.Tensor, cls_score: torch.Tensor, bbox_pred: torch.Tensor, centerness: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, avg_factor: float)dict[源代码]

Calculate the loss of a single scale level based on the features extracted by the detection head.

参数
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmdet.models.dense_heads.AnchorFreeHead(num_classes: int, in_channels: int, feat_channels: int = 256, stacked_convs: int = 4, strides: Union[Sequence[int], Sequence[Tuple[int, int]]] = (4, 8, 16, 32, 64), dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'IoULoss'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'conv_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'})[源代码]

Anchor-free head (FCOS, Fovea, RepPoints, etc.).

参数
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • feat_channels (int) – Number of hidden channels. Used in child classes.

  • stacked_convs (int) – Number of stacking convs of the head.

  • strides (Sequence[int] or Sequence[Tuple[int, int]]) – Downsample factor of each feature map.

  • dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.

  • conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Default: “auto”.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder. Defaults ‘DistancePointBBoxCoder’.

  • conv_cfg (ConfigDict or dict, Optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict, Optional) – Config dict for normalization layer. Defaults to None.

  • train_cfg (ConfigDict or dict, Optional) – Training config of anchor-free head.

  • test_cfg (ConfigDict or dict, Optional) – Testing config of anchor-free head.

  • init_cfg (ConfigDict or dict or list[ConfigDict or dict]) – Initialization config dict.

aug_test(aug_batch_feats: List[torch.Tensor], aug_batch_img_metas: List[List[torch.Tensor]], rescale: bool = False)List[numpy.ndarray][源代码]

Test function with test time augmentation.

参数
  • aug_batch_feats (list[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains features for all images in the batch.

  • aug_batch_img_metas (list[list[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. each dict has image information.

  • rescale (bool, optional) – Whether to rescale the results. Defaults to False.

返回

bbox results of each class

返回类型

list[ndarray]

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor], List[torch.Tensor]][源代码]

Forward features from the upstream network.

参数

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

返回

Usually contain classification scores and bbox predictions.

  • cls_scores (list[Tensor]): Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

返回类型

tuple

forward_single(x: torch.Tensor)Tuple[torch.Tensor, ...][源代码]

Forward features of a single scale level.

参数

x (Tensor) – FPN feature maps of the specified stride.

返回

Scores for each class, bbox predictions, features after classification and regression conv layers, some models needs these features like FCOS.

返回类型

tuple

abstract get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Any[源代码]

Compute regression, classification and centerness targets for points in multiple images.

参数
  • points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

abstract loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

参数
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

class mmdet.models.dense_heads.AnchorHead(num_classes: int, in_channels: int, feat_channels: int = 256, anchor_generator: Union[mmengine.config.config.ConfigDict, dict] = {'ratios': [0.5, 1.0, 2.0], 'scales': [8, 16, 32], 'strides': [4, 8, 16, 32, 64], 'type': 'AnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'clip_border': True, 'target_means': (0.0, 0.0, 0.0, 0.0), 'target_stds': (1.0, 1.0, 1.0, 1.0), 'type': 'DeltaXYWHBBoxCoder'}, reg_decoded_bbox: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'})[源代码]

Anchor-based head (RPN, RetinaNet, SSD, etc.).

参数
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • feat_channels (int) – Number of hidden channels. Used in child classes.

  • anchor_generator (dict) – Config dict for anchor generator

  • bbox_coder (dict) – Config of bounding box coder.

  • reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Default False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.

  • loss_cls (dict) – Config of classification loss.

  • loss_bbox (dict) – Config of localization loss.

  • train_cfg (dict) – Training config of anchor head.

  • test_cfg (dict) – Testing config of anchor head.

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor]][源代码]

Forward features from the upstream network.

参数

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

返回

A tuple of classification scores and bbox prediction.

  • cls_scores (list[Tensor]): Classification scores for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

返回类型

tuple

forward_single(x: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward feature of a single scale level.

参数

x (Tensor) – Features of a single scale level.

返回

cls_score (Tensor): Cls scores for a single scale level the channels number is num_base_priors * num_classes. bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_base_priors * 4.

返回类型

tuple

get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: Union[torch.device, str] = 'cuda')Tuple[List[List[torch.Tensor]], List[List[torch.Tensor]]][源代码]

Get anchors according to feature map sizes.

参数
  • featmap_sizes (list[tuple]) – Multi-level feature map sizes.

  • batch_img_metas (list[dict]) – Image meta info.

  • device (torch.device | str) – Device for returned tensors. Defaults to cuda.

返回

  • anchor_list (list[list[Tensor]]): Anchors of each image.

  • valid_flag_list (list[list[Tensor]]): Valid flags of each image.

返回类型

tuple

get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True, return_sampling_results: bool = False)tuple[源代码]

Compute regression and classification targets for anchors in multiple images.

参数
  • anchor_list (list[list[Tensor]]) – Multi level anchors of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, 4).

  • valid_flag_list (list[list[Tensor]]) – Multi level valid flags of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, )

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

  • unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.

  • return_sampling_results (bool) – Whether to return the sampling results. Defaults to False.

返回

Usually returns a tuple containing learning targets.

  • labels_list (list[Tensor]): Labels of each level.

  • label_weights_list (list[Tensor]): Label weights of each level.

  • bbox_targets_list (list[Tensor]): BBox targets of each level.

  • bbox_weights_list (list[Tensor]): BBox weights of each level.

  • avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.

additional_returns: This function enables user-defined returns from

self._get_targets_single. These returns are currently refined to properties at each feature map (i.e. having HxW dimension). The results will be concatenated after the end

返回类型

tuple

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

参数
  • cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

返回

A dictionary of loss components.

返回类型

dict

loss_by_feat_single(cls_score: torch.Tensor, bbox_pred: torch.Tensor, anchors: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, bbox_weights: torch.Tensor, avg_factor: int)tuple[源代码]

Calculate the loss of a single scale level based on the features extracted by the detection head.

参数
  • cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).

  • bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).

  • anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).

  • labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).

  • label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)

  • bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).

  • bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).

  • avg_factor (int) – Average factor that is used to average the loss.

返回

loss components.

返回类型

tuple

class mmdet.models.dense_heads.AutoAssignHead(*args, force_topk: bool = False, topk: int = 9, pos_loss_weight: float = 0.25, neg_loss_weight: float = 0.75, center_loss_weight: float = 0.75, **kwargs)[源代码]

AutoAssignHead head used in AutoAssign.

More details can be found in the paper .

参数
  • force_topk (bool) – Used in center prior initialization to handle extremely small gt. Default is False.

  • topk (int) – The number of points used to calculate the center prior when no point falls in gt_bbox. Only work when force_topk if True. Defaults to 9.

  • pos_loss_weight (float) – The loss weight of positive loss and with default value 0.25.

  • neg_loss_weight (float) – The loss weight of negative loss and with default value 0.75.

  • center_loss_weight (float) – The loss weight of center prior loss and with default value 0.75.

forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale, stride: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Forward features of a single scale level.

参数
  • x (Tensor) – FPN feature maps of the specified stride.

  • scale (mmcv.cnn.Scale) – Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.

返回

scores for each class, bbox predictions and centerness predictions of input feature maps.

返回类型

tuple[Tensor, Tensor, Tensor]

get_neg_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, ious: torch.Tensor, inside_gt_bbox_mask: torch.Tensor)Tuple[torch.Tensor][源代码]

Calculate the negative loss of all points in feature map.

参数
  • cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).

  • objectness (Tensor) – Foreground probability of all points and is shape of (num_points, 1).

  • gt_instances (InstanceData) – Ground truth of instance annotations. It should includes bboxes and labels attributes.

  • ious (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represent the iou of pred_bbox and gt_bboxes.

  • inside_gt_bbox_mask (Tensor) – Tensor of bool type, with shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.

返回

  • neg_loss (Tensor): The negative loss of all points in the feature map.

返回类型

tuple[Tensor]

get_pos_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, reg_loss: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, center_prior_weights: torch.Tensor)Tuple[torch.Tensor][源代码]

Calculate the positive loss of all points in gt_bboxes.

参数
  • cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).

  • objectness (Tensor) – Foreground probability of all points, has shape (num_points, 1).

  • reg_loss (Tensor) – The regression loss of each gt_bbox and each prediction box, has shape of (num_points, num_gt).

  • gt_instances (InstanceData) – Ground truth of instance annotations. It should includes bboxes and labels attributes.

  • center_prior_weights (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represents the center weighting coefficient.

返回

  • pos_loss (Tensor): The positive loss of all points in the gt_bboxes.

返回类型

tuple[Tensor]

get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Tuple[List[torch.Tensor], List[torch.Tensor]][源代码]

Compute regression targets and each point inside or outside gt_bbox in multiple images.

参数
  • points (list[Tensor]) – Points of all fpn level, each has shape (num_points, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

返回

  • inside_gt_bbox_mask_list (list[Tensor]): Each Tensor is with bool type and shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.

  • concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. Each tensor has shape (num_points, num_gt, 4).

返回类型

tuple(list[Tensor], list[Tensor])

init_weights()None[源代码]

Initialize weights of the head.

In particular, we have special initialization for classified conv’s and regression conv’s bias

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)Dict[str, torch.Tensor][源代码]

Calculate the loss based on the features extracted by the detection head.

参数
  • cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.

  • bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.

  • objectnesses (list[Tensor]) – objectness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmdet.models.dense_heads.BoxInstBboxHead(*args, **kwargs)[源代码]

BoxInst box head used in https://arxiv.org/abs/2012.02310.

class mmdet.models.dense_heads.BoxInstMaskHead(*arg, pairwise_size: int = 3, pairwise_dilation: int = 2, warmup_iters: int = 10000, **kwargs)[源代码]

BoxInst mask head used in https://arxiv.org/abs/2012.02310.

This head outputs the mask for BoxInst.

参数
  • pairwise_size (dict) – The size of neighborhood for each pixel. Defaults to 3.

  • pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.

  • warmup_iters (int) – Warmup iterations for pair-wise loss. Defaults to 10000.

get_pairwise_affinity(mask_logits: torch.Tensor)torch.Tensor[源代码]

Compute the pairwise affinity for each pixel.

loss_by_feat(mask_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], positive_infos: List[mmengine.structures.instance_data.InstanceData], **kwargs)dict[源代码]

Calculate the loss based on the features extracted by the mask head.

参数
  • mask_preds (list[Tensor]) – List of predicted masks, each has shape (num_classes, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes, masks, and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of multiple images.

  • positive_infos (List[:obj:InstanceData]) – Information of positive samples of each image that are assigned in detection head.

返回

A dictionary of loss components.

返回类型

dict[str, Tensor]

class mmdet.models.dense_heads.CascadeRPNHead(num_classes: int, num_stages: int, stages: List[Union[dict, mmengine.config.config.ConfigDict]], train_cfg: List[Union[dict, mmengine.config.config.ConfigDict]], test_cfg: Union[mmengine.config.config.ConfigDict, dict], init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

The CascadeRPNHead will predict more accurate region proposals, which is required for two-stage detectors (such as Fast/Faster R-CNN). CascadeRPN consists of a sequence of RPNStage to progressively improve the accuracy of the detected proposals.

More details can be found in https://arxiv.org/abs/1909.06720.

参数
  • num_stages (int) – number of CascadeRPN stages.

  • stages (list[ConfigDict or dict]) – list of configs to build the stages.

  • train_cfg (list[ConfigDict or dict]) – list of configs at training time each stage.

  • test_cfg (ConfigDict or dict) – config at testing time.

  • init_cfg (ConfigDict or list[ConfigDict] or dict or list[dict]) – Initialization config dict.

loss(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample])dict[源代码]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

参数
  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

返回

A dictionary of loss components.

返回类型

dict

loss_and_predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], proposal_cfg: Optional[mmengine.config.config.ConfigDict] = None)Tuple[dict, List[mmengine.structures.instance_data.InstanceData]][源代码]

Perform forward propagation of the head, then calculate loss and predictions from the features and data samples.

参数
  • x (tuple[Tensor]) – Features from FPN.

  • batch_data_samples (list[DetDataSample]) – Each item contains the meta information of each image and corresponding annotations.

  • proposal_cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

返回

the return value is a tuple contains:

  • losses: (dict[str, Tensor]): A dictionary of loss components.

  • predictions (list[InstanceData]): Detection results of each image after the post process.

返回类型

tuple

loss_by_feat()[源代码]

loss_by_feat() is implemented in StageCascadeRPNHead.

predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], rescale: bool = False)List[mmengine.structures.instance_data.InstanceData][源代码]

Perform forward propagation of the detection head and predict detection results on the features of the upstream network.

参数
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

  • rescale (bool, optional) – Whether to rescale the results. Defaults to False.

返回

InstanceData]: Detection results of each image after the post process.

返回类型

list[obj

predict_by_feat()[源代码]

predict_by_feat() is implemented in StageCascadeRPNHead.

class mmdet.models.dense_heads.CenterNetHead(in_channels: int, feat_channels: int, num_classes: int, loss_center_heatmap: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'GaussianFocalLoss'}, loss_wh: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.1, 'type': 'L1Loss'}, loss_offset: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Objects as Points Head. CenterHead use center_point to indicate object’s position. Paper link <https://arxiv.org/abs/1904.07850>

参数
  • in_channels (int) – Number of channel in the input feature map.

  • feat_channels (int) – Number of channel in the intermediate feature map.

  • num_classes (int) – Number of categories excluding the background category.

  • loss_center_heatmap (ConfigDict or dict) – Config of center heatmap loss. Defaults to dict(type=’GaussianFocalLoss’, loss_weight=1.0)

  • loss_wh (ConfigDict or dict) – Config of wh loss. Defaults to dict(type=’L1Loss’, loss_weight=0.1).

  • loss_offset (ConfigDict or dict) – Config of offset loss. Defaults to dict(type=’L1Loss’, loss_weight=1.0).

  • train_cfg (ConfigDict or dict, optional) – Training config. Useless in CenterNet, but we keep this variable for SingleStageDetector.

  • test_cfg (ConfigDict or dict, optional) – Testing config of CenterNet.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization

config dict.

forward(x: Tuple[torch.Tensor, ...])Tuple[List[torch.Tensor]][源代码]

Forward features. Notice CenterNet head does not use FPN.

参数

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

返回

center predict heatmaps for

all levels, the channels number is num_classes.

wh_preds (list[Tensor]): wh predicts for all levels, the channels

number is 2.

offset_preds (list[Tensor]): offset predicts for all levels, the

channels number is 2.

返回类型

center_heatmap_preds (list[Tensor])

forward_single(x: torch.Tensor)Tuple[torch.Tensor, ...][源代码]

Forward feature of a single level.

参数

x (Tensor) – Feature of a single level.

返回

center predict heatmaps, the

channels number is num_classes.

wh_pred (Tensor): wh predicts, the channels number is 2. offset_pred (Tensor): offset predicts, the channels number is 2.

返回类型

center_heatmap_pred (Tensor)

get_targets(gt_bboxes: List[torch.Tensor], gt_labels: List[torch.Tensor], feat_shape: tuple, img_shape: tuple)Tuple[dict, int][源代码]

Compute regression and classification targets in multiple images.

参数
  • gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.

  • gt_labels (list[Tensor]) – class indices corresponding to each box.

  • feat_shape (tuple) – feature map shape with value [B, _, H, W]

  • img_shape (tuple) – image shape.

返回

The float value is mean avg_factor, the dict has components below:

  • center_heatmap_target (Tensor): targets of center heatmap, shape (B, num_classes, H, W).

  • wh_target (Tensor): targets of wh predict, shape (B, 2, H, W).

  • offset_target (Tensor): targets of offset predict, shape (B, 2, H, W).

  • wh_offset_target_weight (Tensor): weights of wh and offset predict, shape (B, 2, H, W).

返回类型

tuple[dict, float]

init_weights()None[源代码]

Initialize weights of the head.

loss_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Compute losses of the head.

参数
  • center_heatmap_preds (list[Tensor]) – center predict heatmaps for all levels with shape (B, num_classes, H, W).

  • wh_preds (list[Tensor]) – wh predicts for all levels with shape (B, 2, H, W).

  • offset_preds (list[Tensor]) – offset predicts for all levels with shape (B, 2, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

返回

which has components below:
  • loss_center_heatmap (Tensor): loss of center heatmap.

  • loss_wh (Tensor): loss of hw heatmap

  • loss_offset (Tensor): loss of offset heatmap.

返回类型

dict[str, Tensor]

predict_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_img_metas: Optional[List[dict]] = None, rescale: bool = True, with_nms: bool = False)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform network output for a batch into bbox predictions.

参数
  • center_heatmap_preds (list[Tensor]) – Center predict heatmaps for all levels with shape (B, num_classes, H, W).

  • wh_preds (list[Tensor]) – WH predicts for all levels with shape (B, 2, H, W).

  • offset_preds (list[Tensor]) – Offset predicts for all levels with shape (B, 2, H, W).

  • batch_img_metas (list[dict], optional) – Batch image meta info. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to True.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to False.

返回

Instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

返回类型

list[InstanceData]

class mmdet.models.dense_heads.CenterNetUpdateHead(num_classes: int, in_channels: int, regress_ranges: Sequence[Tuple[int, int]] = ((0, 80), (64, 160), (128, 320), (256, 640), (512, 1000000000)), hm_min_radius: int = 4, hm_min_overlap: float = 0.8, more_pos_thresh: float = 0.2, more_pos_topk: int = 9, soft_weight_on_reg: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'neg_weight': 0.75, 'pos_weight': 0.25, 'type': 'GaussianFocalLoss'}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'GIoULoss'}, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, **kwargs)[源代码]

CenterNetUpdateHead is an improved version of CenterNet in CenterNet2. Paper link https://arxiv.org/abs/2103.07461.

参数
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channel in the input feature map.

  • regress_ranges (Sequence[Tuple[int, int]]) – Regress range of multiple level points.

  • hm_min_radius (int) – Heatmap target minimum radius of cls branch. Defaults to 4.

  • hm_min_overlap (float) – Heatmap target minimum overlap of cls branch. Defaults to 0.8.

  • more_pos_thresh (float) – The filtering threshold when the cls branch adds more positive samples. Defaults to 0.2.

  • more_pos_topk (int) – The maximum number of additional positive samples added to each gt. Defaults to 9.

  • soft_weight_on_reg (bool) – Whether to use the soft target of the cls branch as the soft weight of the bbox branch. Defaults to False.

  • loss_cls (ConfigDict or dict) – Config of cls loss. Defaults to dict(type=’GaussianFocalLoss’, loss_weight=1.0)

  • loss_bbox (ConfigDict or dict) – Config of bbox loss. Defaults to dict(type=’GIoULoss’, loss_weight=2.0).

  • norm_cfg (ConfigDict or dict, optional) – dictionary to construct and config norm layer. Defaults to norm_cfg=dict(type='GN', num_groups=32, requires_grad=True).

  • train_cfg (ConfigDict or dict, optional) – Training config. Unused in CenterNet. Reserved for compatibility with SingleStageDetector.

  • test_cfg (ConfigDict or dict, optional) – Testing config of CenterNet.

add_cls_pos_inds(flatten_points: torch.Tensor, flatten_bbox_preds: torch.Tensor, featmap_sizes: torch.Tensor, batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Tuple[Optional[torch.Tensor], Optional[torch.Tensor]][源代码]

Provide additional adaptive positive samples to the classification branch.

参数
  • flatten_points (Tensor) – The point after flatten, including batch image and all levels. The shape is (N, 2).

  • flatten_bbox_preds (Tensor) – The bbox predicts after flatten, including batch image and all levels. The shape is (N, 4).

  • featmap_sizes (Tensor) – Feature map size of all layers. The shape is (5, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

返回

  • pos_inds (Tensor): Adaptively selected positive sample index.

  • cls_labels (Tensor): Corresponding positive class label.

返回类型

tuple

forward(x: Tuple[torch.Tensor])Tuple[List[torch.Tensor], List[torch.Tensor]][源代码]

Forward features from the upstream network.

参数

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

返回

A tuple of each level outputs.

  • cls_scores (list[Tensor]): Box scores for each scale level, each is a 4D-tensor, the channel number is num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is 4.

返回类型

tuple

forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale, stride: int)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward features of a single scale level.

参数
  • x (Tensor) – FPN feature maps of the specified stride.

  • scale (mmcv.cnn.Scale) – Learnable scale module to resize the bbox prediction.

  • stride (int) – The corresponding stride for feature maps.

返回

scores for each class, bbox predictions of input feature maps.

返回类型

tuple

get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData])Tuple[torch.Tensor, torch.Tensor][源代码]

Compute classification and bbox targets for points in multiple images.

参数
  • points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

返回

Targets of each level.

  • concat_lvl_labels (Tensor): Labels of all level and batch.

  • concat_lvl_bbox_targets (Tensor): BBox targets of all level and batch.

返回类型

tuple

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict],