mmdet.apis¶
- async mmdet.apis.async_inference_detector(model, imgs)[source]¶
Async inference image(s) with the detector.
- Parameters
model (nn.Module) – The loaded detector.
img (str | ndarray) – Either image files or loaded images.
- Returns
Awaitable detection results.
- mmdet.apis.inference_detector(model: torch.nn.modules.module.Module, imgs: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]], test_pipeline: Optional[mmcv.transforms.wrappers.Compose] = None) → Union[mmdet.structures.det_data_sample.DetDataSample, List[mmdet.structures.det_data_sample.DetDataSample]][source]¶
Inference image(s) with the detector.
- Parameters
model (nn.Module) – The loaded detector.
imgs (str, ndarray, Sequence[str/ndarray]) – Either image files or loaded images.
test_pipeline (
Compose
) – Test pipeline.
- Returns
If imgs is a list or tuple, the same length list type results will be returned, otherwise return the detection results directly.
- Return type
DetDataSample
or list[DetDataSample
]
- mmdet.apis.init_detector(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, palette: str = 'none', device: str = 'cuda:0', cfg_options: Optional[dict] = None) → torch.nn.modules.module.Module[source]¶
Initialize a detector from config file.
- Parameters
config (str,
Path
, ormmengine.Config
) – Config file path,Path
, or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
palette (str) – Color palette used for visualization. If palette is stored in checkpoint, use checkpoint’s palette first, otherwise use externally passed palette. Currently, supports ‘coco’, ‘voc’, ‘citys’ and ‘random’. Defaults to none.
device (str) – The device where the anchors will be put on. Defaults to cuda:0.
cfg_options (dict, optional) – Options to override some settings in the used config.
- Returns
The constructed detector.
- Return type
nn.Module
mmdet.datasets¶
datasets¶
- class mmdet.datasets.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[source]¶
A sampler wrapper for grouping images with similar aspect ratio (< 1 or.
>= 1) into a same batch.
- Parameters
sampler (Sampler) – Base sampler.
batch_size (int) – Size of mini-batch.
drop_last (bool) – If
True
, the sampler will drop the last batch if its size would be less thanbatch_size
.
- class mmdet.datasets.BaseDetDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Base dataset for detection.
- Parameters
proposal_file (str, optional) – Proposals file path. Defaults to None.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- full_init() → None[source]¶
Load annotation file and set
BaseDataset._fully_initialized
to True.If
lazy_init=False
,full_init
will be called during the instantiation andself._fully_initialized
will be set to True. Ifobj._fully_initialized=False
, the class method decorated byforce_full_init
will callfull_init
automatically.Several steps to initialize annotation:
load_data_list: Load annotations from annotation file.
load_proposals: Load proposals from proposal file, if self.proposal_file is not None.
filter data information: Filter annotations according to filter_cfg.
slice_data: Slice dataset according to
self._indices
serialize_data: Serialize
self.data_list
if
self.serialize_data
is True.
- get_cat_ids(idx: int) → List[int][source]¶
Get COCO category ids by index.
- Parameters
idx (int) – Index of data.
- Returns
All categories in the image of specified index.
- Return type
List[int]
- load_proposals() → None[source]¶
Load proposals from proposals file.
The proposals_list should be a dict[img_path: proposals] with the same length as data_list. And the proposals should be a dict or
InstanceData
usually contains following keys.bboxes (np.ndarry): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
scores (np.ndarry): Classification scores, has a shape (num_instance, ).
- class mmdet.datasets.CityscapesDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Dataset for Cityscapes.
- class mmdet.datasets.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[source]¶
Sampler that restricts data loading to the label of the dataset.
A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks
The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py
- Parameters
dataset – Dataset used for sampling.
seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.
num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.
- class mmdet.datasets.CocoDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Dataset for COCO.
- COCOAPI¶
- filter_data() → List[dict][source]¶
Filter annotations according to filter_cfg.
- Returns
Filtered results.
- Return type
List[dict]
- class mmdet.datasets.CocoPanopticDataset(ann_file: str = '', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'ann': None, 'img': None, 'seg': None}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[source]¶
Coco dataset for Panoptic segmentation.
The annotation format is shown as follows. The ann field is optional for testing.
[ { 'filename': f'{image_id:012}.png', 'image_id':9 'segments_info': [ { 'id': 8345037, (segment_id in panoptic png, convert from rgb) 'category_id': 51, 'iscrowd': 0, 'bbox': (x1, y1, w, h), 'area': 24315 }, ... ] }, ... ]
- Parameters
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as class information. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to None.data_prefix (dict, optional) – Prefix for training data. Defaults to
dict(img=None, ann=None, seg=None)
. The prefixseg
which is for panoptic segmentation map must be not None.filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=False
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.
- COCOAPI¶
- class mmdet.datasets.CrowdHumanDataset(data_root, ann_file, extra_ann_file=None, **kwargs)[source]¶
Dataset for CrowdHuman.
- Parameters
data_root (str) – The root directory for
data_prefix
andann_file
.ann_file (str) – Annotation file path.
extra_ann_file (str | optional) – The path of extra image metas for CrowdHuman. It can be created by CrowdHumanDataset automatically or by tools/misc/get_crowdhuman_id_hw.py manually. Defaults to None.
- class mmdet.datasets.DeepFashionDataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Dataset for DeepFashion.
- class mmdet.datasets.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]¶
Group Multi-Source Infinite Sampler.
According to the sampling ratio, sample data from different datasets but the same group to form batches.
- Parameters
dataset (Sized) – The dataset.
batch_size (int) – Size of mini-batch.
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.
shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.
seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.
- mmdet.datasets.LVISDataset¶
alias of
mmdet.datasets.lvis.LVISV05Dataset
- class mmdet.datasets.LVISV05Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
LVIS v0.5 dataset for detection.
- class mmdet.datasets.LVISV1Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
LVIS v1 dataset for detection.
- class mmdet.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, dict], pipeline: Sequence[str], skip_type_keys: Optional[Sequence[str]] = None, max_refetch: int = 15, lazy_init: bool = False)[source]¶
A wrapper of multiple images mixed dataset.
Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process. At the same time, we provide the dynamic_scale parameter to dynamically change the output image size.
- Parameters
dataset (
CustomDataset
) – The dataset to be mixed.pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.
dynamic_scale (tuple[int], optional) – The image scale can be changed dynamically. Default to None. It is deprecated.
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.
max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Default: 15.
- get_data_info(idx: int) → dict[source]¶
Get annotation by index.
- Parameters
idx (int) – Global index of
ConcatDataset
.- Returns
The idx-th annotation of the datasets.
- Return type
dict
- property metainfo: dict¶
Get the meta information of the multi-image-mixed dataset.
- Returns
The meta information of multi-image-mixed dataset.
- Return type
dict
- class mmdet.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]¶
Multi-Source Infinite Sampler.
According to the sampling ratio, sample data from different datasets to form batches.
- Parameters
dataset (Sized) – The dataset.
batch_size (int) – Size of mini-batch.
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.
shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.
seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.
Examples
>>> dataset_type = 'ConcatDataset' >>> sub_dataset_type = 'CocoDataset' >>> data_root = 'data/coco/' >>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json' >>> unsup_ann = '../coco_semi_annos/' \ >>> 'instances_train2017.1@10-unlabeled.json' >>> dataset = dict(type=dataset_type, >>> datasets=[ >>> dict( >>> type=sub_dataset_type, >>> data_root=data_root, >>> ann_file=sup_ann, >>> data_prefix=dict(img='train2017/'), >>> filter_cfg=dict(filter_empty_gt=True, min_size=32), >>> pipeline=sup_pipeline), >>> dict( >>> type=sub_dataset_type, >>> data_root=data_root, >>> ann_file=unsup_ann, >>> data_prefix=dict(img='train2017/'), >>> filter_cfg=dict(filter_empty_gt=True, min_size=32), >>> pipeline=unsup_pipeline), >>> ]) >>> train_dataloader = dict( >>> batch_size=5, >>> num_workers=5, >>> persistent_workers=True, >>> sampler=dict(type='MultiSourceSampler', >>> batch_size=5, source_ratio=[1, 4]), >>> batch_sampler=None, >>> dataset=dataset)
- class mmdet.datasets.Objects365V1Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Objects365 v1 dataset for detection.
- COCOAPI¶
- class mmdet.datasets.Objects365V2Dataset(*args, seg_map_suffix: str = '.png', proposal_file: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, **kwargs)[source]¶
Objects365 v2 dataset for detection.
- COCOAPI¶
- class mmdet.datasets.OpenImagesChallengeDataset(ann_file: str, **kwargs)[source]¶
Open Images Challenge dataset for detection.
- Parameters
ann_file (str) – Open Images Challenge box annotation in txt format.
- class mmdet.datasets.OpenImagesDataset(label_file: str, meta_file: str, hierarchy_file: str, image_level_ann_file: Optional[str] = None, **kwargs)[source]¶
Open Images dataset for detection.
- Parameters
ann_file (str) – Annotation file path.
label_file (str) – File path of the label description file that maps the classes names in MID format to their short descriptions.
meta_file (str) – File path to get image metas.
hierarchy_file (str) – The file path of the class hierarchy.
image_level_ann_file (str) – Human-verified image level annotation, which is used in evaluation.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class mmdet.datasets.WIDERFaceDataset(**kwargs)[source]¶
Reader for the WIDER Face dataset in PASCAL VOC format.
Conversion scripts can be found in https://github.com/sovrasov/wider-face-pascal-voc-annotations
- class mmdet.datasets.XMLDataset(img_subdir: str = 'JPEGImages', ann_subdir: str = 'Annotations', **kwargs)[source]¶
XML dataset for detection.
- Parameters
img_subdir (str) – Subdir where images are stored. Default: JPEGImages.
ann_subdir (str) – Subdir where annotations are. Default: Annotations.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- property bbox_min_size: Optional[str]¶
Return the minimum size of bounding boxes in the images.
- filter_data() → List[dict][source]¶
Filter annotations according to filter_cfg.
- Returns
Filtered results.
- Return type
List[dict]
- load_data_list() → List[dict][source]¶
Load annotation from XML style ann_file.
- Returns
Annotation info from XML file.
- Return type
list[dict]
- parse_data_info(img_info: dict) → Union[dict, List[dict]][source]¶
Parse raw annotation to target format.
- Parameters
img_info (dict) – Raw image information, usually it includes img_id, file_name, and xml_path.
- Returns
Parsed annotation.
- Return type
Union[dict, List[dict]]
- property sub_data_root: str¶
Return the sub data root.
- mmdet.datasets.get_loading_pipeline(pipeline)[source]¶
Only keep loading image and annotations related configuration.
- Parameters
pipeline (list[dict]) – Data pipeline configs.
- Returns
- The new pipeline list with only keep
loading image and annotations related configuration.
- Return type
list[dict]
Examples
>>> pipelines = [ ... dict(type='LoadImageFromFile'), ... dict(type='LoadAnnotations', with_bbox=True), ... dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), ... dict(type='RandomFlip', flip_ratio=0.5), ... dict(type='Normalize', **img_norm_cfg), ... dict(type='Pad', size_divisor=32), ... dict(type='DefaultFormatBundle'), ... dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ... ] >>> expected_pipelines = [ ... dict(type='LoadImageFromFile'), ... dict(type='LoadAnnotations', with_bbox=True) ... ] >>> assert expected_pipelines == ... get_loading_pipeline(pipelines)
api_wrappers¶
- class mmdet.datasets.api_wrappers.COCO(*args: Any, **kwargs: Any)[source]¶
This class is almost the same as official pycocotools package.
It implements some snake case function aliases. So that the COCO class has the same interface as LVIS class.
- class mmdet.datasets.api_wrappers.COCOPanoptic(*args: Any, **kwargs: Any)[source]¶
This wrapper is for loading the panoptic style annotation file.
The format is shown in the CocoPanopticDataset class.
- Parameters
annotation_file (str, optional) – Path of annotation file. Defaults to None.
- load_anns(ids: Union[List[int], int] = []) → Optional[List[dict]][source]¶
Load anns with the specified ids.
self.anns
is a list of annotation lists instead of a list of annotations.- Parameters
ids (Union[List[int], int]) – Integer ids specifying anns.
- Returns
Loaded ann objects.
- Return type
anns (List[dict], optional)
samplers¶
- class mmdet.datasets.samplers.AspectRatioBatchSampler(sampler: torch.utils.data.sampler.Sampler, batch_size: int, drop_last: bool = False)[source]¶
A sampler wrapper for grouping images with similar aspect ratio (< 1 or.
>= 1) into a same batch.
- Parameters
sampler (Sampler) – Base sampler.
batch_size (int) – Size of mini-batch.
drop_last (bool) – If
True
, the sampler will drop the last batch if its size would be less thanbatch_size
.
- class mmdet.datasets.samplers.ClassAwareSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, seed: Optional[int] = None, num_sample_class: int = 1)[source]¶
Sampler that restricts data loading to the label of the dataset.
A class-aware sampling strategy to effectively tackle the non-uniform class distribution. The length of the training data is consistent with source data. Simple improvements based on Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks
The implementation logic is referred to https://github.com/Sense-X/TSD/blob/master/mmdet/datasets/samplers/distributed_classaware_sampler.py
- Parameters
dataset – Dataset used for sampling.
seed (int, optional) – random seed used to shuffle the sampler. This number should be identical across all processes in the distributed group. Defaults to None.
num_sample_class (int) – The number of samples taken from each per-label list. Defaults to 1.
- class mmdet.datasets.samplers.GroupMultiSourceSampler(dataset: mmengine.dataset.base_dataset.BaseDataset, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]¶
Group Multi-Source Infinite Sampler.
According to the sampling ratio, sample data from different datasets but the same group to form batches.
- Parameters
dataset (Sized) – The dataset.
batch_size (int) – Size of mini-batch.
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.
shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.
seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.
- class mmdet.datasets.samplers.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, seed: Optional[int] = None)[source]¶
Multi-Source Infinite Sampler.
According to the sampling ratio, sample data from different datasets to form batches.
- Parameters
dataset (Sized) – The dataset.
batch_size (int) – Size of mini-batch.
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch.
shuffle (bool) – Whether shuffle the dataset or not. Defaults to True.
seed (int, optional) – Random seed. If None, set a random seed. Defaults to None.
Examples
>>> dataset_type = 'ConcatDataset' >>> sub_dataset_type = 'CocoDataset' >>> data_root = 'data/coco/' >>> sup_ann = '../coco_semi_annos/instances_train2017.1@10.json' >>> unsup_ann = '../coco_semi_annos/' \ >>> 'instances_train2017.1@10-unlabeled.json' >>> dataset = dict(type=dataset_type, >>> datasets=[ >>> dict( >>> type=sub_dataset_type, >>> data_root=data_root, >>> ann_file=sup_ann, >>> data_prefix=dict(img='train2017/'), >>> filter_cfg=dict(filter_empty_gt=True, min_size=32), >>> pipeline=sup_pipeline), >>> dict( >>> type=sub_dataset_type, >>> data_root=data_root, >>> ann_file=unsup_ann, >>> data_prefix=dict(img='train2017/'), >>> filter_cfg=dict(filter_empty_gt=True, min_size=32), >>> pipeline=unsup_pipeline), >>> ]) >>> train_dataloader = dict( >>> batch_size=5, >>> num_workers=5, >>> persistent_workers=True, >>> sampler=dict(type='MultiSourceSampler', >>> batch_size=5, source_ratio=[1, 4]), >>> batch_sampler=None, >>> dataset=dataset)
transforms¶
- class mmdet.datasets.transforms.Albu(transforms: List[dict], bbox_params: Optional[dict] = None, keymap: Optional[dict] = None, skip_img_without_anno: bool = False)[source]¶
Albumentation augmentation.
Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information.
Required Keys:
img (np.uint8)
gt_bboxes (HorizontalBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
Modified Keys:
img (np.uint8)
gt_bboxes (HorizontalBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
img_shape (tuple)
An example of
transforms
is as followed:[ dict( type='ShiftScaleRotate', shift_limit=0.0625, scale_limit=0.0, rotate_limit=0, interpolation=1, p=0.5), dict( type='RandomBrightnessContrast', brightness_limit=[0.1, 0.3], contrast_limit=[0.1, 0.3], p=0.2), dict(type='ChannelShuffle', p=0.1), dict( type='OneOf', transforms=[ dict(type='Blur', blur_limit=3, p=1.0), dict(type='MedianBlur', blur_limit=3, p=1.0) ], p=0.1), ]
- Parameters
transforms (list[dict]) – A list of albu transformations
bbox_params (dict, optional) – Bbox_params for albumentation Compose
keymap (dict, optional) – Contains {‘input key’:’albumentation-style key’}
skip_img_without_anno (bool) – Whether to skip the image if no ann left after aug. Defaults to False.
- class mmdet.datasets.transforms.AutoAugment(policies: List[List[Union[dict, mmengine.config.config.ConfigDict]]] = [[{'type': 'Equalize', 'prob': 0.8, 'level': 1}, {'type': 'ShearY', 'prob': 0.8, 'level': 4}], [{'type': 'Color', 'prob': 0.4, 'level': 9}, {'type': 'Equalize', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 1}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Solarize', 'prob': 0.8, 'level': 3}, {'type': 'Equalize', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.4, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 2}], [{'type': 'Color', 'prob': 0.2, 'level': 0}, {'type': 'Equalize', 'prob': 0.8, 'level': 8}], [{'type': 'Equalize', 'prob': 0.4, 'level': 8}, {'type': 'SolarizeAdd', 'prob': 0.8, 'level': 3}], [{'type': 'ShearX', 'prob': 0.2, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 8}], [{'type': 'Color', 'prob': 0.6, 'level': 1}, {'type': 'Equalize', 'prob': 1.0, 'level': 2}], [{'type': 'Invert', 'prob': 0.4, 'level': 9}, {'type': 'Rotate', 'prob': 0.6, 'level': 0}], [{'type': 'Equalize', 'prob': 1.0, 'level': 9}, {'type': 'ShearY', 'prob': 0.6, 'level': 3}], [{'type': 'Color', 'prob': 0.4, 'level': 7}, {'type': 'Equalize', 'prob': 0.6, 'level': 0}], [{'type': 'Posterize', 'prob': 0.4, 'level': 6}, {'type': 'AutoContrast', 'prob': 0.4, 'level': 7}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Color', 'prob': 0.6, 'level': 9}], [{'type': 'Solarize', 'prob': 0.2, 'level': 4}, {'type': 'Rotate', 'prob': 0.8, 'level': 9}], [{'type': 'Rotate', 'prob': 1.0, 'level': 7}, {'type': 'TranslateY', 'prob': 0.8, 'level': 9}], [{'type': 'ShearX', 'prob': 0.0, 'level': 0}, {'type': 'Solarize', 'prob': 0.8, 'level': 4}], [{'type': 'ShearY', 'prob': 0.8, 'level': 0}, {'type': 'Color', 'prob': 0.6, 'level': 4}], [{'type': 'Color', 'prob': 1.0, 'level': 0}, {'type': 'Rotate', 'prob': 0.6, 'level': 2}], [{'type': 'Equalize', 'prob': 0.8, 'level': 4}, {'type': 'Equalize', 'prob': 0.0, 'level': 8}], [{'type': 'Equalize', 'prob': 1.0, 'level': 4}, {'type': 'AutoContrast', 'prob': 0.6, 'level': 2}], [{'type': 'ShearY', 'prob': 0.4, 'level': 7}, {'type': 'SolarizeAdd', 'prob': 0.6, 'level': 7}], [{'type': 'Posterize', 'prob': 0.8, 'level': 2}, {'type': 'Solarize', 'prob': 0.6, 'level': 10}], [{'type': 'Solarize', 'prob': 0.6, 'level': 8}, {'type': 'Equalize', 'prob': 0.6, 'level': 1}], [{'type': 'Color', 'prob': 0.8, 'level': 6}, {'type': 'Rotate', 'prob': 0.4, 'level': 5}]], prob: Optional[List[float]] = None)[source]¶
Auto augmentation.
This data augmentation is proposed in AutoAugment: Learning Augmentation Policies from Data and in Learning Data Augmentation Strategies for Object Detection.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_ignore_flags (bool) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_bboxes_labels
gt_masks
gt_ignore_flags
gt_seg_map
Added Keys:
homography_matrix
- Parameters
policies (List[List[Union[dict, ConfigDict]]]) – The policies of auto augmentation.Each policy in
policies
is a specific augmentation policy, and is composed by several augmentations. When AutoAugment is called, a random policy inpolicies
will be selected to augment images. Defaults to policy_v0().prob (list[float], optional) – The probabilities associated with each policy. The length should be equal to the policy number and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.
Examples
>>> policies = [ >>> [ >>> dict(type='Sharpness', prob=0.0, level=8), >>> dict(type='ShearX', prob=0.4, level=0,) >>> ], >>> [ >>> dict(type='Rotate', prob=0.6, level=10), >>> dict(type='Color', prob=1.0, level=6) >>> ] >>> ] >>> augmentation = AutoAugment(policies) >>> img = np.ones(100, 100, 3) >>> gt_bboxes = np.ones(10, 4) >>> results = dict(img=img, gt_bboxes=gt_bboxes) >>> results = augmentation(results)
- class mmdet.datasets.transforms.AutoContrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Auto adjust image contrast.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing AutoContrast should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – No use for AutoContrast transformation. Defaults to None.
min_mag (float) – No use for AutoContrast transformation. Defaults to 0.1.
max_mag (float) – No use for AutoContrast transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.Brightness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Adjust the brightness of the image. A magnitude=0 gives a black image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Brightness transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Brightness transformation. Defaults to 0.1.
max_mag (float) – The maximum magnitude for Brightness transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.CachedMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True, max_cached_images: int = 20, random_pop: bool = True, prob: float = 1.0)[source]¶
Cached mixup data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The cached mixup transform steps are as follows: 1. Append the results from the last transform into the cache. 2. Another random image is picked from the cache and embedded in the top left patch(after padding and resizing) 3. The target of mixup transform is the weighted average of mixup image and origin image.
Required Keys:
img
gt_bboxes (np.float32) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.
pad_val (int) – Pad value. Defaults to 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
- class mmdet.datasets.transforms.CachedMosaic(*args, max_cached_images: int = 40, random_pop: bool = True, **kwargs)[source]¶
Cached mosaic augmentation.
Cached mosaic transform will random select images from the cache and combine them into one output image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The cached mosaic transform steps are as follows: 1. Append the results from the last transform into the cache. 2. Choose the mosaic center as the intersections of 4 images 3. Get the left top image according to the index, and randomly sample another 3 images from the result cache. 4. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_bboxes (np.float32) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
- class mmdet.datasets.transforms.Color(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Adjust the color balance of the image, in a manner similar to the controls on a colour TV set. A magnitude=0 gives a black & white image, whereas magnitude=1 gives the original image. The bboxes, masks and segmentations are not modified.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Color transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Color transformation. Defaults to 0.1.
max_mag (float) – The maximum magnitude for Color transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.ColorTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Base class for color transformations. All color transformations need to inherit from this base class.
ColorTransform
unifies the class attributes and class functions of color transformations (Color, Brightness, Contrast, Sharpness, Solarize, SolarizeAdd, Equalize, AutoContrast, Invert, and Posterize), and only distort color channels, without impacting the locations of the instances.Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for color transformation. Defaults to 0.1.
max_mag (float) – The maximum magnitude for color transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.Contrast(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Control the contrast of the image. A magnitude=0 gives a gray image, whereas magnitude=1 gives the original imageThe bboxes, masks and segmentations are not modified.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Contrast transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Contrast transformation. Defaults to 0.1.
max_mag (float) – The maximum magnitude for Contrast transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.CopyPaste(max_num_pasted: int = 100, bbox_occluded_thr: int = 10, mask_occluded_thr: int = 300, selected: bool = True)[source]¶
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation The simple copy-paste transform steps are as follows:
The destination image is already resized with aspect ratio kept, cropped and padded.
Randomly select a source image, which is also already resized with aspect ratio kept, cropped and padded in a similar way as the destination image.
Randomly select some objects from the source image.
Paste these source objects to the destination image directly, due to the source and destination image have the same size.
Update object masks of the destination image, for some origin objects may be occluded.
Generate bboxes from the updated destination masks and filter some objects which are totally occluded, and adjust bboxes which are partly occluded.
Append selected source bboxes, masks, and labels.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (BitmapMasks) (optional)
Modified Keys:
img
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
gt_masks (optional)
- Parameters
max_num_pasted (int) – The maximum number of pasted objects. Defaults to 100.
bbox_occluded_thr (int) – The threshold of occluded bbox. Defaults to 10.
mask_occluded_thr (int) – The threshold of occluded mask. Defaults to 300.
selected (bool) – Whether select objects or not. If select is False, all objects of the source image will be pasted to the destination image. Defaults to True.
- class mmdet.datasets.transforms.CutOut(n_holes: Union[int, Tuple[int, int]], cutout_shape: Optional[Union[Tuple[int, int], List[Tuple[int, int]]]] = None, cutout_ratio: Optional[Union[Tuple[float, float], List[Tuple[float, float]]]] = None, fill_in: Union[Tuple[float, float, float], Tuple[int, int, int]] = (0, 0, 0))[source]¶
CutOut operation.
Randomly drop some regions of image used in Cutout.
Required Keys:
img
Modified Keys:
img
- Parameters
n_holes (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [
n_holes[0]
,n_holes[1]
].cutout_shape (tuple[int, int] or list[tuple[int, int]], optional) – The candidate shape of dropped regions. It can be
tuple[int, int]
to use a fixed cutout shape, orlist[tuple[int, int]]
to randomly choose shape from the list. Defaults to None.(tuple[float (cutout_ratio) – optional): The candidate ratio of dropped regions. It can be
tuple[float, float]
to use a fixed ratio orlist[tuple[float, float]]
to randomly choose ratio from the list. Please note thatcutout_shape
andcutout_ratio
cannot be both given at the same time. Defaults to None.or list[tuple[float (float]) – optional): The candidate ratio of dropped regions. It can be
tuple[float, float]
to use a fixed ratio orlist[tuple[float, float]]
to randomly choose ratio from the list. Please note thatcutout_shape
andcutout_ratio
cannot be both given at the same time. Defaults to None.float]] – optional): The candidate ratio of dropped regions. It can be
tuple[float, float]
to use a fixed ratio orlist[tuple[float, float]]
to randomly choose ratio from the list. Please note thatcutout_shape
andcutout_ratio
cannot be both given at the same time. Defaults to None.
- :paramoptional): The candidate ratio of dropped regions. It can be
tuple[float, float]
to use a fixed ratio orlist[tuple[float, float]]
to randomly choose ratio from the list. Please note thatcutout_shape
andcutout_ratio
cannot be both given at the same time. Defaults to None.
- Parameters
fill_in (tuple[float, float, float] or tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Defaults to (0, 0, 0).
- class mmdet.datasets.transforms.Equalize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Equalize the image histogram. The bboxes, masks and segmentations are not modified.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Equalize transformation. Defaults to 1.0.
level (int, optional) – No use for Equalize transformation. Defaults to None.
min_mag (float) – No use for Equalize transformation. Defaults to 0.1.
max_mag (float) – No use for Equalize transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.Expand(mean: Sequence[Union[int, float]] = (0, 0, 0), to_rgb: bool = True, ratio_range: Sequence[Union[int, float]] = (1, 4), seg_ignore_label: Optional[int] = None, prob: float = 0.5)[source]¶
Random expand the image & bboxes & masks & segmentation map.
Randomly place the original image on a canvas of
ratio
x original image size filled with mean values. The ratio is in the range of ratio_range.Required Keys:
img
img_shape
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_masks
gt_seg_map
- Parameters
mean (sequence) – mean value of dataset.
to_rgb (bool) – if need to convert the order of mean to align with RGB.
ratio_range (sequence)) – range of expand ratio.
seg_ignore_label (int) – label of ignore segmentation map.
prob (float) – probability of applying this transformation
- class mmdet.datasets.transforms.FilterAnnotations(min_gt_bbox_wh: Tuple[int, int] = (1, 1), min_gt_mask_area: int = 1, by_box: bool = True, by_mask: bool = False, keep_empty: bool = True)[source]¶
Filter invalid annotations.
Required Keys:
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_ignore_flags (bool) (optional)
Modified Keys:
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_masks (optional)
gt_ignore_flags (optional)
- Parameters
min_gt_bbox_wh (tuple[float]) – Minimum width and height of ground truth boxes. Default: (1., 1.)
min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True
by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Defaults to True.
- class mmdet.datasets.transforms.FixShapeResize(width: int, height: int, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation: str = 'bilinear')[source]¶
Resize images & bbox & seg to the specified size.
This transform resizes the input image according to
width
andheight
. Bboxes, masks, and seg map are then resized with the same parameters.Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
scale
scale_factor
keep_ratio
homography_matrix
- Parameters
width (int) – width for resizing.
height (int) – height for resizing. Defaults to None.
pad_val (Number | dict[str, Number], optional) –
Padding value for if the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:
img: The value to pad the image.
seg: The value to pad the semantic segmentation map.
Defaults to dict(img=0, seg=255).
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.
clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.GeomTransform(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 1.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Base class for geometric transformations. All geometric transformations need to inherit from this base class.
GeomTransform
unifies the class attributes and class functions of geometric transformations (ShearX, ShearY, Rotate, TranslateX, and TranslateY), and records the homography matrix.Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for performing the geometric transformation and should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for geometric transformation. Defaults to 0.0.
max_mag (float) – The maximum magnitude for geometric transformation. Defaults to 1.0.
reversal_prob (float) – The probability that reverses the geometric transformation magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.ImageToTensor(keys)[source]¶
Convert image to
torch.Tensor
by given keys.The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).
- Parameters
keys (Sequence[str]) – Key of images to be converted to Tensor.
- class mmdet.datasets.transforms.InstaBoost(action_candidate: tuple = ('normal', 'horizontal', 'skip'), action_prob: tuple = (1, 0, 0), scale: tuple = (0.8, 1.2), dx: int = 15, dy: int = 15, theta: tuple = (- 1, 1), color_prob: float = 0.5, hflag: bool = False, aug_ratio: float = 0.5)[source]¶
Data augmentation method in InstaBoost: Boosting Instance Segmentation Via Probability Map Guided Copy-Pasting.
Refer to https://github.com/GothicAi/Instaboost for implementation details.
Required Keys:
img (np.uint8)
instances
Modified Keys:
img (np.uint8)
instances
- Parameters
action_candidate (tuple) – Action candidates. “normal”, “horizontal”, “vertical”, “skip” are supported. Defaults to (‘normal’, ‘horizontal’, ‘skip’).
action_prob (tuple) – Corresponding action probabilities. Should be the same length as action_candidate. Defaults to (1, 0, 0).
scale (tuple) – (min scale, max scale). Defaults to (0.8, 1.2).
dx (int) – The maximum x-axis shift will be (instance width) / dx. Defaults to 15.
dy (int) – The maximum y-axis shift will be (instance height) / dy. Defaults to 15.
theta (tuple) – (min rotation degree, max rotation degree). Defaults to (-1, 1).
color_prob (float) – Probability of images for color augmentation. Defaults to 0.5.
hflag (bool) – Whether to use heatmap guided. Defaults to False.
aug_ratio (float) – Probability of applying this transformation. Defaults to 0.5.
- class mmdet.datasets.transforms.Invert(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Invert images.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing invert therefore should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – No use for Invert transformation. Defaults to None.
min_mag (float) – No use for Invert transformation. Defaults to 0.1.
max_mag (float) – No use for Invert transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.LoadAnnotations(with_mask: bool = False, poly2mask: bool = True, box_type: str = 'hbox', **kwargs)[source]¶
Load and process the
instances
andseg_map
annotation provided by dataset.The annotation format is as the following:
{ 'instances': [ { # List of 4 numbers representing the bounding box of the # instance, in (x1, y1, x2, y2) order. 'bbox': [x1, y1, x2, y2], # Label of image classification. 'bbox_label': 1, # Used in instance/panoptic segmentation. The segmentation mask # of the instance or the information of segments. # 1. If list[list[float]], it represents a list of polygons, # one for each connected component of the object. Each # list[float] is one simple polygon in the format of # [x1, y1, ..., xn, yn] (n≥3). The Xs and Ys are absolute # coordinates in unit of pixels. # 2. If dict, it represents the per-pixel segmentation mask in # COCO’s compressed RLE format. The dict should have keys # “size” and “counts”. Can be loaded by pycocotools 'mask': list[list[float]] or dict, } ] # Filename of semantic or panoptic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # In (x1, y1, x2, y2) order, float type. N is the number of bboxes # in an image 'gt_bboxes': BaseBoxes(N, 4) # In int type. 'gt_bboxes_labels': np.ndarray(N, ) # In built-in class 'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W) # In uint8 type. 'gt_seg_map': np.ndarray (H, W) # in (x, y, v) order, float type. }
Required Keys:
height
width
instances
bbox (optional)
bbox_label
mask (optional)
ignore_flag
seg_map_path (optional)
Added Keys:
gt_bboxes (BaseBoxes[torch.float32])
gt_bboxes_labels (np.int64)
gt_masks (BitmapMasks | PolygonMasks)
gt_seg_map (np.uint8)
gt_ignore_flags (bool)
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.
with_label (bool) – Whether to parse and load the label annotation. Defaults to True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
poly2mask (bool) – Whether to convert mask to bitmap. Default: True.
box_type (str) – The box type used to wrap the bboxes. If
box_type
is None, gt_bboxes will keep being np.ndarray. Defaults to ‘hbox’.imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘cv2’.file_client_args (dict) – Arguments to instantiate a FileClient. See :class:
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class mmdet.datasets.transforms.LoadEmptyAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = False, with_seg: bool = False, seg_ignore_label: int = 255)[source]¶
Load Empty Annotations for unlabeled images.
Added Keys: - gt_bboxes (np.float32) - gt_bboxes_labels (np.int64) - gt_masks (BitmapMasks | PolygonMasks) - gt_seg_map (np.uint8) - gt_ignore_flags (bool)
- Parameters
with_bbox (bool) – Whether to load the pseudo bbox annotation. Defaults to True.
with_label (bool) – Whether to load the pseudo label annotation. Defaults to True.
with_mask (bool) – Whether to load the pseudo mask annotation. Default: False.
with_seg (bool) – Whether to load the pseudo semantic segmentation annotation. Defaults to False.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.
- class mmdet.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'}, ignore_empty: bool = False)[source]¶
Load an image from
results['img']
.Similar with
LoadImageFromFile
, but the image has been loaded asnp.ndarray
inresults['img']
. Can be used when loading image from webcam.Required Keys:
img
Modified Keys:
img
img_path
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- class mmdet.datasets.transforms.LoadMultiChannelImageFromFiles(to_float32: bool = False, color_type: str = 'unchanged', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[source]¶
Load multi-channel images from a list of separate channel files.
Required Keys:
img_path
Modified Keys:
img
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for :func:
mmcv.imfrombytes
. Defaults to ‘unchanged’.imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :func:mmcv.imfrombytes
for details. Defaults to ‘cv2’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class mmdet.datasets.transforms.LoadPanopticAnnotations(with_bbox: bool = True, with_label: bool = True, with_mask: bool = True, with_seg: bool = True, box_type: str = 'hbox', imdecode_backend: str = 'cv2', file_client_args: dict = {'backend': 'disk'})[source]¶
Load multiple types of panoptic annotations.
The annotation format is as the following:
{ 'instances': [ { # List of 4 numbers representing the bounding box of the # instance, in (x1, y1, x2, y2) order. 'bbox': [x1, y1, x2, y2], # Label of image classification. 'bbox_label': 1, }, ... ] 'segments_info': [ { # id = cls_id + instance_id * INSTANCE_OFFSET 'id': int, # Contiguous category id defined in dataset. 'category': int # Thing flag. 'is_thing': bool }, ... ] # Filename of semantic or panoptic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # In (x1, y1, x2, y2) order, float type. N is the number of bboxes # in an image 'gt_bboxes': BaseBoxes(N, 4) # In int type. 'gt_bboxes_labels': np.ndarray(N, ) # In built-in class 'gt_masks': PolygonMasks (H, W) or BitmapMasks (H, W) # In uint8 type. 'gt_seg_map': np.ndarray (H, W) # in (x, y, v) order, float type. }
Required Keys:
height
width
instances - bbox - bbox_label - ignore_flag
segments_info - id - category - is_thing
seg_map_path
Added Keys:
gt_bboxes (BaseBoxes[torch.float32])
gt_bboxes_labels (np.int64)
gt_masks (BitmapMasks | PolygonMasks)
gt_seg_map (np.uint8)
gt_ignore_flags (bool)
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Defaults to True.
with_label (bool) – Whether to parse and load the label annotation. Defaults to True.
with_mask (bool) – Whether to parse and load the mask annotation. Defaults to True.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
box_type (str) – The box mode used to wrap the bboxes.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘cv2’.file_client_args (dict) – Arguments to instantiate a FileClient. See :class:
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class mmdet.datasets.transforms.LoadProposals(num_max_proposals: Optional[int] = None)[source]¶
Load proposal pipeline.
Required Keys:
proposals
Modified Keys:
proposals
- Parameters
num_max_proposals (int, optional) – Maximum number of proposals to load. If not specified, all proposals will be loaded.
- class mmdet.datasets.transforms.MinIoURandomCrop(min_ious: Sequence[float] = (0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size: float = 0.3, bbox_clip_border: bool = True)[source]¶
Random crop the image & bboxes & masks & segmentation map, the cropped patches have minimum IoU requirement with original image & bboxes & masks.
& segmentation map, the IoU threshold is randomly selected from min_ious.
Required Keys:
img
img_shape
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_ignore_flags (bool) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_bboxes_labels
gt_masks
gt_ignore_flags
gt_seg_map
- Parameters
min_ious (Sequence[float]) – minimum IoU threshold for all intersections with bounding boxes.
min_crop_size (float) – minimum crop’s size (i.e. h,w := a*h, a*w,
a >= min_crop_size) (where) –
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
- class mmdet.datasets.transforms.MixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, max_iters: int = 15, bbox_clip_border: bool = True)[source]¶
MixUp data augmentation.
mixup transform +------------------------------+ | mixup image | | | +--------|--------+ | | | | | | |---------------+ | | | | | | | | image | | | | | | | | | | | |-----------------+ | | pad | +------------------------------+ The mixup transform steps are as follows: 1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing) 2. The target of mixup transform is the weighted average of mixup image and origin image.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.
pad_val (int) – Pad value. Defaults to 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
- class mmdet.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, prob: float = 1.0)[source]¶
Mosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
- class mmdet.datasets.transforms.MultiBranch(branch_field: List[str], **branch_pipelines: dict)[source]¶
Multiple branch pipeline wrapper.
Generate multiple data-augmented versions of the same image. MultiBranch needs to specify the branch names of all pipelines of the dataset, perform corresponding data augmentation for the current branch, and return None for other branches, which ensures the consistency of return format across different samples.
- Parameters
branch_field (list) – List of branch names.
branch_pipelines (dict) – Dict of different pipeline configs to be composed.
Examples
>>> branch_field = ['sup', 'unsup_teacher', 'unsup_student'] >>> sup_pipeline = [ >>> dict(type='LoadImageFromFile', >>> file_client_args=dict(backend='disk')), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=0.5), >>> dict( >>> type='MultiBranch', >>> branch_field=branch_field, >>> sup=dict(type='PackDetInputs')) >>> ] >>> weak_pipeline = [ >>> dict(type='LoadImageFromFile', >>> file_client_args=dict(backend='disk')), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=0.0), >>> dict( >>> type='MultiBranch', >>> branch_field=branch_field, >>> sup=dict(type='PackDetInputs')) >>> ] >>> strong_pipeline = [ >>> dict(type='LoadImageFromFile', >>> file_client_args=dict(backend='disk')), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict(type='Resize', scale=(1333, 800), keep_ratio=True), >>> dict(type='RandomFlip', prob=1.0), >>> dict( >>> type='MultiBranch', >>> branch_field=branch_field, >>> sup=dict(type='PackDetInputs')) >>> ] >>> unsup_pipeline = [ >>> dict(type='LoadImageFromFile', >>> file_client_args=file_client_args), >>> dict(type='LoadEmptyAnnotations'), >>> dict( >>> type='MultiBranch', >>> branch_field=branch_field, >>> unsup_teacher=weak_pipeline, >>> unsup_student=strong_pipeline) >>> ] >>> from mmcv.transforms import Compose >>> sup_branch = Compose(sup_pipeline) >>> unsup_branch = Compose(unsup_pipeline) >>> print(sup_branch) >>> Compose( >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa >>> LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa >>> Resize(scale=(1333, 800), scale_factor=None, keep_ratio=True, clip_object_border=True), backend=cv2), interpolation=bilinear) # noqa >>> RandomFlip(prob=0.5, direction=horizontal) >>> MultiBranch(branch_pipelines=['sup']) >>> ) >>> print(unsup_branch) >>> Compose( >>> LoadImageFromFile(ignore_empty=False, to_float32=False, color_type='color', imdecode_backend='cv2', file_client_args={'backend': 'disk'}) # noqa >>> LoadEmptyAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, seg_ignore_label=255) # noqa >>> MultiBranch(branch_pipelines=['unsup_teacher', 'unsup_student']) >>> )
- transform(results: dict) → dict[source]¶
Transform function to apply transforms sequentially.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
- ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of
models from different branches.
- ’data_sample’ (Dict[str,obj:DetDataSample]): The annotation
info of the sample from different branches.
- Return type
dict
- class mmdet.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]¶
Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default this includes:img_id
: id of the imageimg_path
: path to the image fileori_shape
: original shape of the image as a tuple (h, w)img_shape
: shape of the image input to the network as a tuple (h, w). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.scale_factor
: a float indicating the preprocessing scaleflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping direction
- Parameters
meta_keys (Sequence[str], optional) – Meta keys to be converted to
mmcv.DataContainer
and collected indata[img_metas]
. Default:('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction')
- class mmdet.datasets.transforms.Pad(size: Optional[Tuple[int, int]] = None, size_divisor: Optional[int] = None, pad_to_square: bool = False, pad_val: Union[int, float, dict] = {'img': 0, 'seg': 255}, padding_mode: str = 'constant')[source]¶
Pad the image & segmentation map.
There are three padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. and (3)pad to square. Also, pad to square and pad to the minimum size can be used as the same time.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_masks
gt_seg_map
Added Keys:
pad_shape
pad_fixed_size
pad_size_divisor
- Parameters
size (tuple, optional) – Fixed padding size. Expected padding shape (width, height). Defaults to None.
size_divisor (int, optional) – The divisor of padded size. Defaults to None.
pad_to_square (bool) – Whether to pad the image into a square. Currently only used for YOLOX. Defaults to False.
pad_val (Number | dict[str, Number], optional) –
the pad_mode is “constant”. If it is a single number, the value to pad the image is the number and to pad the semantic segmentation map is 255. If it is a dict, it should have the following keys:
img: The value to pad the image.
seg: The value to pad the semantic segmentation map.
Defaults to dict(img=0, seg=255).
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Defaults to ‘constant’.
constant: pads with a constant value, this value is specified with pad_val.
edge: pads with the last value at the edge of the image.
reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
- class mmdet.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[Union[int, float]] = (0.5, 1.5), saturation_range: Sequence[Union[int, float]] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels
Required Keys:
img (np.uint8)
Modified Keys:
img (np.float32)
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (sequence) – range of contrast.
saturation_range (sequence) – range of saturation.
hue_delta (int) – delta of hue.
- class mmdet.datasets.transforms.Posterize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 4.0)[source]¶
Posterize images (reduce the number of bits for each color channel).
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Posterize transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Posterize transformation. Defaults to 0.0.
max_mag (float) – The maximum magnitude for Posterize transformation. Defaults to 4.0.
- class mmdet.datasets.transforms.ProposalBroadcaster(transforms: List[Union[dict, Callable]] = [])[source]¶
A transform wrapper to apply the wrapped transforms to process both gt_bboxes and proposals without adding any codes. It will do the following steps:
Scatter the broadcasting targets to a list of inputs of the wrapped transforms. The type of the list should be list[dict, dict], which the first is the original inputs, the second is the processing results that gt_bboxes being rewritten by the proposals.
Apply
self.transforms
, with same random parameters, which is sharing with a context manager. The type of the outputs is a list[dict, dict].Gather the outputs, update the proposals in the first item of the outputs with the gt_bboxes in the second .
- Parameters
transforms (list, optional) – Sequence of transform object or config dict to be wrapped. Defaults to [].
- Note: The TransformBroadcaster in MMCV can achieve the same operation as
ProposalBroadcaster, but need to set more complex parameters.
Examples
>>> pipeline = [ >>> dict(type='LoadImageFromFile'), >>> dict(type='LoadProposals', num_max_proposals=2000), >>> dict(type='LoadAnnotations', with_bbox=True), >>> dict( >>> type='ProposalBroadcaster', >>> transforms=[ >>> dict(type='Resize', scale=(1333, 800), >>> keep_ratio=True), >>> dict(type='RandomFlip', prob=0.5), >>> ]), >>> dict(type='PackDetInputs')]
- class mmdet.datasets.transforms.RandAugment(aug_space: List[Union[dict, mmengine.config.config.ConfigDict]] = [[{'type': 'AutoContrast'}], [{'type': 'Equalize'}], [{'type': 'Invert'}], [{'type': 'Rotate'}], [{'type': 'Posterize'}], [{'type': 'Solarize'}], [{'type': 'SolarizeAdd'}], [{'type': 'Color'}], [{'type': 'Contrast'}], [{'type': 'Brightness'}], [{'type': 'Sharpness'}], [{'type': 'ShearX'}], [{'type': 'ShearY'}], [{'type': 'TranslateX'}], [{'type': 'TranslateY'}]], aug_num: int = 2, prob: Optional[List[float]] = None)[source]¶
Rand augmentation.
This data augmentation is proposed in RandAugment: Practical automated data augmentation with a reduced search space.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_ignore_flags (bool) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_bboxes_labels
gt_masks
gt_ignore_flags
gt_seg_map
Added Keys:
homography_matrix
- Parameters
aug_space (List[List[Union[dict, ConfigDict]]]) – The augmentation space of rand augmentation. Each augmentation transform in
aug_space
is a specific transform, and is composed by several augmentations. When RandAugment is called, a random transform inaug_space
will be selected to augment images. Defaults to aug_space.aug_num (int) – Number of augmentation to apply equentially. Defaults to 2.
prob (list[float], optional) – The probabilities associated with each augmentation. The length should be equal to the augmentation space and the sum should be 1. If not given, a uniform distribution will be assumed. Defaults to None.
Examples
>>> aug_space = [ >>> dict(type='Sharpness'), >>> dict(type='ShearX'), >>> dict(type='Color'), >>> ], >>> augmentation = RandAugment(aug_space) >>> img = np.ones(100, 100, 3) >>> gt_bboxes = np.ones(10, 4) >>> results = dict(img=img, gt_bboxes=gt_bboxes) >>> results = augmentation(results)
- class mmdet.datasets.transforms.RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True)[source]¶
Random affine transform data augmentation.
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.
max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.
border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
- class mmdet.datasets.transforms.RandomCenterCropPad(crop_size: Optional[tuple] = None, ratios: Optional[tuple] = (0.9, 1.0, 1.1), border: Optional[int] = 128, mean: Optional[Sequence] = None, std: Optional[Sequence] = None, to_rgb: Optional[bool] = None, test_mode: bool = False, test_pad_mode: Optional[tuple] = ('logical_or', 127), test_pad_add_pix: int = 0, bbox_clip_border: bool = True)[source]¶
Random center crop and random around padding for CornerNet.
This operation generates randomly cropped image from the original image and pads it simultaneously. Different from
RandomCrop
, the output shape may not equal tocrop_size
strictly. We choose a random value fromratios
and the output shape could be larger or smaller thancrop_size
. The padding operation is also different fromPad
, here we use around padding instead of right-bottom padding.The relation between output image (padding image) and original image:
output image +----------------------------+ | padded area | +------|----------------------------|----------+ | | cropped area | | | | +---------------+ | | | | | . center | | | original image | | | range | | | | | +---------------+ | | +------|----------------------------|----------+ | padded area | +----------------------------+
There are 5 main areas in the figure:
output image: output image of this operation, also called padding image in following instruction.
original image: input image of this operation.
padded area: non-intersect area of output image and original image.
cropped area: the overlap of output image and original image.
center range: a smaller area where random center chosen from. center range is computed by
border
and original image’s shape to avoid our random center is too close to original image’s border.
Also this operation act differently in train and test mode, the summary pipeline is listed below.
Train pipeline:
Choose a
random_ratio
fromratios
, the shape of padding image will berandom_ratio * crop_size
.Choose a
random_center
in center range.Generate padding image with center matches the
random_center
.Initialize the padding image with pixel value equals to
mean
.Copy the cropped area to padding image.
Refine annotations.
Test pipeline:
Compute output shape according to
test_pad_mode
.Generate padding image with center matches the original image center.
Initialize the padding image with pixel value equals to
mean
.Copy the
cropped area
to padding image.
Required Keys:
img (np.float32)
img_shape (tuple)
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
Modified Keys:
img (np.float32)
img_shape (tuple)
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
- Parameters
crop_size (tuple, optional) – expected size after crop, final size will computed according to ratio. Requires (width, height) in train mode, and None in test mode.
ratios (tuple, optional) – random select a ratio from tuple and crop image to (crop_size[0] * ratio) * (crop_size[1] * ratio). Only available in train mode. Defaults to (0.9, 1.0, 1.1).
border (int, optional) – max distance from center select area to image border. Only available in train mode. Defaults to 128.
mean (sequence, optional) – Mean values of 3 channels.
std (sequence, optional) – Std values of 3 channels.
to_rgb (bool, optional) – Whether to convert the image from BGR to RGB.
test_mode (bool) – whether involve random variables in transform. In train mode, crop_size is fixed, center coords and ratio is random selected from predefined lists. In test mode, crop_size is image’s original shape, center coords and ratio is fixed. Defaults to False.
test_pad_mode (tuple, optional) –
padding method and padding shape value, only available in test mode. Default is using ‘logical_or’ with 127 as padding shape value.
’logical_or’: final_shape = input_shape | padding_shape_value
’size_divisor’: final_shape = int( ceil(input_shape / padding_shape_value) * padding_shape_value)
Defaults to (‘logical_or’, 127).
test_pad_add_pix (int) – Extra padding pixel in test mode. Defaults to 0.
bbox_clip_border (bool) – Whether clip the objects outside the border of the image. Defaults to True.
- class mmdet.datasets.transforms.RandomCrop(crop_size: tuple, crop_type: str = 'absolute', allow_negative_crop: bool = False, recompute_bbox: bool = False, bbox_clip_border: bool = True)[source]¶
Random crop the image & bboxes & masks.
The absolute
crop_size
is sampled based oncrop_type
andimage_size
, then the cropped results are generated.Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_ignore_flags (bool) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_masks (optional)
gt_ignore_flags (optional)
gt_seg_map (optional)
Added Keys:
homography_matrix
- Parameters
crop_size (tuple) – The relative ratio or absolute pixels of (width, height).
crop_type (str, optional) – One of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Defaults to “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Defaults to False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Defaults to False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
Note
- If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is,
gt_bboxes
corresponds togt_labels
andgt_masks
, andgt_bboxes_ignore
corresponds togt_labels_ignore
andgt_masks_ignore
.If the crop does not contain any gt-bbox region and
allow_negative_crop
is set to False, skip this image.
- class mmdet.datasets.transforms.RandomErasing(n_patches: Union[int, Tuple[int, int]], ratio: Union[float, Tuple[float, float]], squared: bool = True, bbox_erased_thr: float = 0.9, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255)[source]¶
RandomErasing operation.
Random Erasing randomly selects a rectangle region in an image and erases its pixels with random values. RandomErasing.
Required Keys:
img
gt_bboxes (HorizontalBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (BitmapMasks) (optional)
Modified Keys: - img - gt_bboxes (optional) - gt_bboxes_labels (optional) - gt_ignore_flags (optional) - gt_masks (optional)
- Parameters
n_patches (int or tuple[int, int]) – Number of regions to be dropped. If it is given as a tuple, number of patches will be randomly selected from the closed interval [
n_patches[0]
,n_patches[1]
].ratio (float or tuple[float, float]) – The ratio of erased regions. It can be
float
to use a fixed ratio ortuple[float, float]
to randomly choose ratio from the interval.squared (bool) – Whether to erase square region. Defaults to True.
bbox_erased_thr (float) – The threshold for the maximum area proportion of the bbox to be erased. When the proportion of the area where the bbox is erased is greater than the threshold, the bbox will be removed. Defaults to 0.9.
img_border_value (int or float or tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.
- class mmdet.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]¶
Flip the image & bbox & mask & segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, and gt_seg_map. There are 3 flip modes:
prob
is float,direction
is string: the image will bedirection``ly flipped with probability of ``prob
. E.g.,prob=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.
prob
is float,direction
is list of string: the image willbe
direction[i]``ly flipped with probability of ``prob/len(direction)
. E.g.,prob=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
prob
is list of float,direction
is list of string:given
len(prob) == len(direction)
, the image will bedirection[i]``ly flipped with probability of ``prob[i]
. E.g.,prob=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
flip
flip_direction
homography_matrix
- Parameters
prob (float | list[float], optional) – The flipping probability. Defaults to None.
direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal
prob
. Each element inprob
indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.
- class mmdet.datasets.transforms.RandomOrder(transforms: Union[Dict, Callable[[Dict], Dict], Sequence[Union[Dict, Callable[[Dict], Dict]]]])[source]¶
Shuffle the transform Sequence.
- class mmdet.datasets.transforms.RandomShift(prob: float = 0.5, max_shift_px: int = 32, filter_thr_px: int = 1)[source]¶
Shift the image and box given shift pixels and probability.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32])
gt_bboxes_labels (np.int64)
gt_ignore_flags (bool) (optional)
Modified Keys:
img
gt_bboxes
gt_bboxes_labels
gt_ignore_flags (bool) (optional)
- Parameters
prob (float) – Probability of shifts. Defaults to 0.5.
max_shift_px (int) – The max pixels for shifting. Defaults to 32.
filter_thr_px (int) – The width and height threshold for filtering. The bbox and the rest of the targets below the width and height threshold will be filtered. Defaults to 1.
- class mmdet.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]¶
Resize images & bbox & seg.
This transform resizes the input image according to
scale
orscale_factor
. Bboxes, masks, and seg map are then resized with the same scale factor. ifscale
andscale_factor
are both set, it will usescale
to resize.Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
img_shape
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
scale
scale_factor
keep_ratio
homography_matrix
- Parameters
scale (int or tuple) – Images scales for resizing. Defaults to None
scale_factor (float or tuple[float]) – Scale factors for resizing. Defaults to None.
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.
clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.Rotate(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Rotate the images, bboxes, masks and segmentation map.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The maximum angle for rotation. Defaults to 0.0.
max_mag (float) – The maximum angle for rotation. Defaults to 30.0.
reversal_prob (float) – The probability that reverses the rotation magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.SegRescale(scale_factor: float = 1, backend: str = 'cv2')[source]¶
Rescale semantic segmentation maps.
This transform rescale the
gt_seg_map
according toscale_factor
.Required Keys:
gt_seg_map
Modified Keys:
gt_seg_map
- Parameters
scale_factor (float) – The scale factor of the final output. Defaults to 1.
backend (str) – Image rescale backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
- class mmdet.datasets.transforms.Sharpness(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.1, max_mag: float = 1.9)[source]¶
Adjust images sharpness. A positive magnitude would enhance the sharpness and a negative magnitude would make the image blurry. A magnitude=0 gives the origin img.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Sharpness transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Sharpness transformation. Defaults to 0.1.
max_mag (float) – The maximum magnitude for Sharpness transformation. Defaults to 1.9.
- class mmdet.datasets.transforms.ShearX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Shear the images, bboxes, masks and segmentation map horizontally.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for performing Shear and should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum angle for the horizontal shear. Defaults to 0.0.
max_mag (float) – The maximum angle for the horizontal shear. Defaults to 30.0.
reversal_prob (float) – The probability that reverses the horizontal shear magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.ShearY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 30.0, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Shear the images, bboxes, masks and segmentation map vertically.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for performing ShearY and should be in range [0, 1]. Defaults to 1.0.
level (int, optional) – The level should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum angle for the vertical shear. Defaults to 0.0.
max_mag (float) – The maximum angle for the vertical shear. Defaults to 30.0.
reversal_prob (float) – The probability that reverses the vertical shear magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.Solarize(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 256.0)[source]¶
Solarize images (Invert all pixels above a threshold value of magnitude.).
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing Solarize transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for Solarize transformation. Defaults to 0.0.
max_mag (float) – The maximum magnitude for Solarize transformation. Defaults to 256.0.
- class mmdet.datasets.transforms.SolarizeAdd(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 110.0)[source]¶
SolarizeAdd images. For each pixel in the image that is less than 128, add an additional amount to it decided by the magnitude.
Required Keys:
img
Modified Keys:
img
- Parameters
prob (float) – The probability for performing SolarizeAdd transformation. Defaults to 1.0.
level (int, optional) – Should be in range [0,_MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum magnitude for SolarizeAdd transformation. Defaults to 0.0.
max_mag (float) – The maximum magnitude for SolarizeAdd transformation. Defaults to 110.0.
- class mmdet.datasets.transforms.ToTensor(keys)[source]¶
Convert some results to
torch.Tensor
by given keys.- Parameters
keys (Sequence[str]) – Keys that need to be converted to Tensor.
- class mmdet.datasets.transforms.TranslateX(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Translate the images, bboxes, masks and segmentation map horizontally.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum pixel’s offset ratio for horizontal translation. Defaults to 0.0.
max_mag (float) – The maximum pixel’s offset ratio for horizontal translation. Defaults to 0.1.
reversal_prob (float) – The probability that reverses the horizontal translation magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.TranslateY(prob: float = 1.0, level: Optional[int] = None, min_mag: float = 0.0, max_mag: float = 0.1, reversal_prob: float = 0.5, img_border_value: Union[int, float, tuple] = 128, mask_border_value: int = 0, seg_ignore_label: int = 255, interpolation: str = 'bilinear')[source]¶
Translate the images, bboxes, masks and segmentation map vertically.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_masks (BitmapMasks | PolygonMasks) (optional)
gt_seg_map (np.uint8) (optional)
Modified Keys:
img
gt_bboxes
gt_masks
gt_seg_map
Added Keys:
homography_matrix
- Parameters
prob (float) – The probability for perform transformation and should be in range 0 to 1. Defaults to 1.0.
level (int, optional) – The level should be in range [0, _MAX_LEVEL]. If level is None, it will generate from [0, _MAX_LEVEL] randomly. Defaults to None.
min_mag (float) – The minimum pixel’s offset ratio for vertical translation. Defaults to 0.0.
max_mag (float) – The maximum pixel’s offset ratio for vertical translation. Defaults to 0.1.
reversal_prob (float) – The probability that reverses the vertical translation magnitude. Should be in range [0,1]. Defaults to 0.5.
img_border_value (int | float | tuple) – The filled values for image border. If float, the same fill value will be used for all the three channels of image. If tuple, it should be 3 elements. Defaults to 128.
mask_border_value (int) – The fill value used for masks. Defaults to 0.
seg_ignore_label (int) – The fill value used for segmentation map. Note this value must equals
ignore_label
insemantic_head
of the corresponding config. Defaults to 255.interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmdet.datasets.transforms.Transpose(keys, order)[source]¶
Transpose some results by given keys.
- Parameters
keys (Sequence[str]) – Keys of results to be transposed.
order (Sequence[int]) – Order of transpose.
- class mmdet.datasets.transforms.YOLOXHSVRandomAug(hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30)[source]¶
Apply HSV augmentation to image sequentially. It is referenced from https://github.com/Megvii- BaseDetection/YOLOX/blob/main/yolox/data/data_augment.py#L21.
Required Keys:
img
Modified Keys:
img
- Parameters
hue_delta (int) – delta of hue. Defaults to 5.
saturation_delta (int) – delta of saturation. Defaults to 30.
value_delta (int) – delat of value. Defaults to 30.
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
mmdet.engine¶
hooks¶
- class mmdet.engine.hooks.CheckInvalidLossHook(interval: int = 50)[source]¶
Check invalid loss hook.
This hook will regularly check whether the loss is valid during training.
- Parameters
interval (int) – Checking interval (every k iterations). Default: 50.
- after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None) → None[source]¶
Regularly check whether the loss is valid every n iterations.
- Parameters
runner (
Runner
) – The runner of the training process.batch_idx (int) – The index of the current batch in the train loop.
data_batch (dict, Optional) – Data from dataloader. Defaults to None.
outputs (dict, Optional) – Outputs from model. Defaults to None.
- class mmdet.engine.hooks.DetVisualizationHook(draw: bool = False, interval: int = 50, score_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, test_out_dir: Optional[str] = None, file_client_args: dict = {'backend': 'disk'})[source]¶
Detection Visualization Hook. Used to visualize validation and testing process prediction results.
In the testing phase:
- If
show
is True, it means that only the prediction results are visualized without storing data, so
vis_backends
needs to be excluded.
- If
- If
test_out_dir
is specified, it means that the prediction results need to be saved to
test_out_dir
. In order to avoid vis_backends also storing data, sovis_backends
needs to be excluded.
- If
vis_backends
takes effect if the user does not specifyshow
and test_out_dir`. You can set
vis_backends
to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.
- Parameters
draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 50.
score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
test_out_dir (str, optional) – directory where painted images will be saved in testing process.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample]) → None[source]¶
Run after every testing iterations.
- Parameters
runner (
Runner
) – The runner of the testing process.batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[
DetDataSample
]) – A batch of data samples that contain annotations and predictions.
- after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmdet.structures.det_data_sample.DetDataSample]) → None[source]¶
Run after every
self.interval
validation iterations.- Parameters
runner (
Runner
) – The runner of the validation process.batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[
DetDataSample
]]) – A batch of data samples that contain annotations and predictions.
- class mmdet.engine.hooks.MeanTeacherHook(momentum: float = 0.001, interval: int = 1, skip_buffer=True)[source]¶
Mean Teacher Hook.
Mean Teacher is an efficient semi-supervised learning method in Mean Teacher. This method requires two models with exactly the same structure, as the student model and the teacher model, respectively. The student model updates the parameters through gradient descent, and the teacher model updates the parameters through exponential moving average of the student model. Compared with the student model, the teacher model is smoother and accumulates more knowledge.
- Parameters
momentum (float) –
- The momentum used for updating teacher’s parameter.
Teacher’s parameter are updated with the formula:
- teacher = (1-momentum) * teacher + momentum * student.
Defaults to 0.001.
interval (int) – Update teacher’s parameter every interval iteration. Defaults to 1.
skip_buffers (bool) – Whether to skip the model buffers, such as batchnorm running stats (running_mean, running_var), it does not perform the ema operation. Default to True.
- after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None) → None[source]¶
Update teacher’s parameter every self.interval iterations.
- class mmdet.engine.hooks.MemoryProfilerHook(interval: int = 50)[source]¶
Memory profiler hook recording memory information including virtual memory, swap memory, and the memory of the current process.
- Parameters
interval (int) – Checking interval (every k iterations). Default: 50.
- after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None) → None[source]¶
Regularly record memory information.
- Parameters
runner (
Runner
) – The runner of the testing process.batch_idx (int) – The index of the current batch in the test loop.
data_batch (dict, optional) – Data from dataloader. Defaults to None.
outputs (Sequence[
DetDataSample
], optional) – Outputs from model. Defaults to None.
- after_train_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[dict] = None) → None[source]¶
Regularly record memory information.
- Parameters
runner (
Runner
) – The runner of the training process.batch_idx (int) – The index of the current batch in the train loop.
data_batch (dict, optional) – Data from dataloader. Defaults to None.
outputs (dict, optional) – Outputs from model. Defaults to None.
- after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: Optional[dict] = None, outputs: Optional[Sequence[mmdet.structures.det_data_sample.DetDataSample]] = None) → None[source]¶
Regularly record memory information.
- Parameters
runner (
Runner
) – The runner of the validation process.batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict, optional) – Data from dataloader. Defaults to None.
outputs (Sequence[
DetDataSample
], optional) – Outputs from model. Defaults to None.
- class mmdet.engine.hooks.NumClassCheckHook[source]¶
Check whether the num_classes in head matches the length of classes in dataset.metainfo.
- class mmdet.engine.hooks.PipelineSwitchHook(switch_epoch, switch_pipeline)[source]¶
Switch data pipeline at switch_epoch.
- Parameters
switch_epoch (int) – switch pipeline at this epoch.
switch_pipeline (list[dict]) – the pipeline to switch to.
- class mmdet.engine.hooks.SyncNormHook[source]¶
Synchronize Norm states before validation, currently used in YOLOX.
- class mmdet.engine.hooks.YOLOXModeSwitchHook(num_last_epochs: int = 15, skip_type_keys: Sequence[str] = ('Mosaic', 'RandomAffine', 'MixUp'))[source]¶
Switch the mode of YOLOX during training.
This hook turns off the mosaic and mixup data augmentation and switches to use L1 loss in bbox_head.
- Parameters
num_last_epochs – The number of latter epochs in the end of the training to close the data augmentation and switch to L1 loss. Defaults to 15.
optimizers¶
- class mmdet.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]¶
- add_params(params: List[dict], module: torch.nn.modules.module.Module, **kwargs) → None[source]¶
Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.
- Parameters
params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
runner¶
schedulers¶
- class mmdet.engine.schedulers.QuadraticWarmupLR(optimizer, *args, **kwargs)[source]¶
Warm up the learning rate of each parameter group by quadratic formula.
- Parameters
optimizer (Optimizer) – Wrapped optimizer.
begin (int) – Step at which to start updating the parameters. Defaults to 0.
end (int) – Step at which to stop updating the parameters. Defaults to INF.
last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.
by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.
verbose (bool) – Whether to print the value for each update. Defaults to False.
- class mmdet.engine.schedulers.QuadraticWarmupMomentum(optimizer, *args, **kwargs)[source]¶
Warm up the momentum value of each parameter group by quadratic formula.
- Parameters
optimizer (Optimizer) – Wrapped optimizer.
begin (int) – Step at which to start updating the parameters. Defaults to 0.
end (int) – Step at which to stop updating the parameters. Defaults to INF.
last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.
by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.
verbose (bool) – Whether to print the value for each update. Defaults to False.
- class mmdet.engine.schedulers.QuadraticWarmupParamScheduler(optimizer: torch.optim.optimizer.Optimizer, param_name: str, begin: int = 0, end: int = 1000000000, last_step: int = - 1, by_epoch: bool = True, verbose: bool = False)[source]¶
Warm up the parameter value of each parameter group by quadratic formula:
\[X_{t} = X_{t-1} + \frac{2t+1}{{(end-begin)}^{2}} \times X_{base}\]- Parameters
optimizer (Optimizer) – Wrapped optimizer.
param_name (str) – Name of the parameter to be adjusted, such as
lr
,momentum
.begin (int) – Step at which to start updating the parameters. Defaults to 0.
end (int) – Step at which to stop updating the parameters. Defaults to INF.
last_step (int) – The index of last step. Used for resume without state dict. Defaults to -1.
by_epoch (bool) – Whether the scheduled parameters are updated by epochs. Defaults to True.
verbose (bool) – Whether to print the value for each update. Defaults to False.
mmdet.evaluation¶
functional¶
- mmdet.evaluation.functional.average_precision(recalls, precisions, mode='area')[source]¶
Calculate average precision (for single or multiple scales).
- Parameters
recalls (ndarray) – shape (num_scales, num_dets) or (num_dets, )
precisions (ndarray) – shape (num_scales, num_dets) or (num_dets, )
mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]
- Returns
calculated average precision
- Return type
float or ndarray
- mmdet.evaluation.functional.bbox_overlaps(bboxes1, bboxes2, mode='iou', eps=1e-06, use_legacy_coordinate=False)[source]¶
Calculate the ious between each bbox of bboxes1 and bboxes2.
- Parameters
bboxes1 (ndarray) – Shape (n, 4)
bboxes2 (ndarray) – Shape (k, 4)
mode (str) – IOU (intersection over union) or IOF (intersection over foreground)
use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Note when function is used in VOCDataset, it should be True to align with the official implementation http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCdevkit_18-May-2011.tar Default: False.
- Returns
Shape (n, k)
- Return type
ious (ndarray)
- mmdet.evaluation.functional.eval_map(det_results, annotations, scale_ranges=None, iou_thr=0.5, ioa_thr=None, dataset=None, logger=None, tpfp_fn=None, nproc=4, use_legacy_coordinate=False, use_group_of=False, eval_mode='area')[source]¶
Evaluate mAP of a dataset.
- Parameters
det_results (list[list]) – [[cls1_det, cls2_det, …], …]. The outer list indicates images, and the inner list indicates per-class detected bboxes.
annotations (list[dict]) –
Ground truth annotations where each item of the list indicates an image. Keys of annotations are:
bboxes: numpy array of shape (n, 4)
labels: numpy array of shape (n, )
bboxes_ignore (optional): numpy array of shape (k, 4)
labels_ignore (optional): numpy array of shape (k, )
scale_ranges (list[tuple] | None) – Range of scales to be evaluated, in the format [(min1, max1), (min2, max2), …]. A range of (32, 64) means the area range between (32**2, 64**2). Defaults to None.
iou_thr (float) – IoU threshold to be considered as matched. Defaults to 0.5.
ioa_thr (float | None) – IoA threshold to be considered as matched, which only used in OpenImages evaluation. Defaults to None.
dataset (list[str] | str | None) – Dataset name or dataset classes, there are minor differences in metrics for different datasets, e.g. “voc”, “imagenet_det”, etc. Defaults to None.
logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.
tpfp_fn (callable | None) – The function used to determine true/ false positives. If None,
tpfp_default()
is used as default unless dataset is ‘det’ or ‘vid’ (tpfp_imagenet()
in this case). If it is given as a function, then this function is used to evaluate tp & fp. Default None.nproc (int) – Processes used for computing TP and FP. Defaults to 4.
use_legacy_coordinate (bool) – Whether to use coordinate system in mmdet v1.x. which means width, height should be calculated as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’ respectively. Defaults to False.
use_group_of (bool) – Whether to use group of when calculate TP and FP, which only used in OpenImages evaluation. Defaults to False.
eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1], PASCAL VOC2007 uses 11points as default evaluate mode, while others are ‘area’. Defaults to ‘area’.
- Returns
(mAP, [dict, dict, …])
- Return type
tuple
- mmdet.evaluation.functional.eval_recalls(gts, proposals, proposal_nums=None, iou_thrs=0.5, logger=None, use_legacy_coordinate=False)[source]¶
Calculate recalls.
- Parameters
gts (list[ndarray]) – a list of arrays of shape (n, 4)
proposals (list[ndarray]) – a list of arrays of shape (k, 4) or (k, 5)
proposal_nums (int | Sequence[int]) – Top N proposals to be evaluated.
iou_thrs (float | Sequence[float]) – IoU thresholds. Default: 0.5.
logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.
use_legacy_coordinate (bool) – Whether use coordinate system in mmdet v1.x. “1” was added to both height and width which means w, h should be computed as ‘x2 - x1 + 1` and ‘y2 - y1 + 1’. Default: False.
- Returns
recalls of different ious and proposal nums
- Return type
ndarray
- mmdet.evaluation.functional.oid_challenge_classes() → list[source]¶
Class names of Open Images Challenge.
- mmdet.evaluation.functional.plot_iou_recall(recalls, iou_thrs)[source]¶
Plot IoU-Recalls curve.
- Parameters
recalls (ndarray or list) – shape (k,)
iou_thrs (ndarray or list) – same shape as recalls
- mmdet.evaluation.functional.plot_num_recall(recalls, proposal_nums)[source]¶
Plot Proposal_num-Recalls curve.
- Parameters
recalls (ndarray or list) – shape (k,)
proposal_nums (ndarray or list) – same shape as recalls
- mmdet.evaluation.functional.pq_compute_multi_core(matched_annotations_list, gt_folder, pred_folder, categories, file_client=None, nproc=32)[source]¶
Evaluate the metrics of Panoptic Segmentation with multithreading.
Same as the function with the same name in panopticapi.
- Parameters
matched_annotations_list (list) – The matched annotation list. Each element is a tuple of annotations of the same image with the format (gt_anns, pred_anns).
gt_folder (str) – The path of the ground truth images.
pred_folder (str) – The path of the prediction images.
categories (str) – The categories of the dataset.
file_client (object) – The file client of the dataset. If None, the backend will be set to disk.
nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When nproc exceeds the number of cpu cores, the number of cpu cores is used.
- mmdet.evaluation.functional.pq_compute_single_core(proc_id, annotation_set, gt_folder, pred_folder, categories, file_client=None, print_log=False)[source]¶
The single core function to evaluate the metric of Panoptic Segmentation.
Same as the function with the same name in panopticapi. Only the function to load the images is changed to use the file client.
- Parameters
proc_id (int) – The id of the mini process.
gt_folder (str) – The path of the ground truth images.
pred_folder (str) – The path of the prediction images.
categories (str) – The categories of the dataset.
file_client (object) – The file client of the dataset. If None, the backend will be set to disk.
print_log (bool) – Whether to print the log. Defaults to False.
- mmdet.evaluation.functional.print_map_summary(mean_ap, results, dataset=None, scale_ranges=None, logger=None)[source]¶
Print mAP and results of each class.
A table will be printed to show the gts/dets/recall/AP of each class and the mAP.
- Parameters
mean_ap (float) – Calculated from eval_map().
results (list[dict]) – Calculated from eval_map().
dataset (list[str] | str | None) – Dataset name or dataset classes.
scale_ranges (list[tuple] | None) – Range of scales to be evaluated.
logger (logging.Logger | str | None) – The way to print the mAP summary. See mmengine.logging.print_log() for details. Defaults to None.
- mmdet.evaluation.functional.print_recall_summary(recalls, proposal_nums, iou_thrs, row_idxs=None, col_idxs=None, logger=None)[source]¶
Print recalls in a table.
- Parameters
recalls (ndarray) – calculated from bbox_recalls
proposal_nums (ndarray or list) – top N proposals
iou_thrs (ndarray or list) – iou thresholds
row_idxs (ndarray) – which rows(proposal nums) to print
col_idxs (ndarray) – which cols(iou thresholds) to print
logger (logging.Logger | str | None) – The way to print the recall summary. See mmengine.logging.print_log() for details. Default: None.
metrics¶
- class mmdet.evaluation.metrics.CityScapesMetric(outfile_prefix: str, seg_prefix: Optional[str] = None, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
CityScapes metric for instance segmentation.
- Parameters
outfile_prefix (str) – The prefix of txt and png files. The txt and png file will be save in a directory whose path is “outfile_prefix.results/”.
seg_prefix (str, optional) – Path to the directory which contains the cityscapes instance segmentation masks. It’s necessary when training and validation. It could be None when infer on test dataset. Defaults to None.
format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
keep_results (bool) – Whether to keep the results. When
format_only
is True,keep_results
must be True. Defaults to False.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
- The computed metrics. The keys are the names of
the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- class mmdet.evaluation.metrics.CocoMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None, sort_categories: bool = False)[source]¶
COCO evaluation metric.
Evaluate AR, AP, and mAP for detection tasks including proposal/box detection and instance segmentation. Please refer to https://cocodataset.org/#detection-eval for more details.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.
metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.
classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.
proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).
iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.
metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.
format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
sort_categories (bool) – Whether sort categories in annotations. Only used for Objects365V1Dataset. Defaults to False.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None) → numpy.ndarray[source]¶
Evaluate proposal recall with COCO’s fast_eval_recall.
- Parameters
results (List[dict]) – Results of the dataset.
proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.
iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.
logger (MMLogger, optional) – Logger used for logging the recall summary.
- Returns
Averaged recall results.
- Return type
np.ndarray
- gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) → str[source]¶
Convert ground truth to coco format json file.
- Parameters
gt_dicts (Sequence[dict]) – Ground truth of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.
- Returns
The filename of the json file.
- Return type
str
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- results2json(results: Sequence[dict], outfile_prefix: str) → dict[source]¶
Dump the detection results to a COCO style json file.
There are 3 types of results: proposals, bbox predictions, mask predictions, and they have different data types. This method will automatically recognize the type, and dump them to json files.
- Parameters
results (Sequence[dict]) – Testing results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.bbox.json”, “somepath/xxx.segm.json”, “somepath/xxx.proposal.json”.
- Returns
Possible keys are “bbox”, “segm”, “proposal”, and values are corresponding filenames.
- Return type
dict
- class mmdet.evaluation.metrics.CocoPanopticMetric(ann_file: Optional[str] = None, seg_prefix: Optional[str] = None, classwise: bool = False, format_only: bool = False, outfile_prefix: Optional[str] = None, nproc: int = 32, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
COCO panoptic segmentation evaluation metric.
Evaluate PQ, SQ RQ for panoptic segmentation tasks. Please refer to https://cocodataset.org/#panoptic-eval for more details.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.
seg_prefix (str, optional) – Path to the directory which contains the coco panoptic segmentation mask. It should be specified when evaluate. Defaults to None.
classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.
outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. It should be specified when format_only is True. Defaults to None.
format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
nproc (int) – Number of processes for panoptic quality computing. Defaults to 32. When
nproc
exceeds the number of cpu cores, the number of cpu cores is used.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) –
The processed results of each batch. There are two cases:
When
outfile_prefix
is not provided, the elements in results are pq_stats which can be summed directly to get PQ.When
outfile_prefix
is provided, the elements in results are tuples like (gt, pred).
- Returns
- The computed metrics. The keys are the names of
the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) → Tuple[str, str][source]¶
Convert ground truth to coco panoptic segmentation format json file.
- Parameters
gt_dicts (Sequence[dict]) – Ground truth of the dataset.
outfile_prefix (str) – The filename prefix of the json file. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.
- Returns
The filename of the json file and the name of the directory which contains panoptic segmentation masks.
- Return type
Tuple[str, str]
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- result2json(results: Sequence[dict], outfile_prefix: str) → Tuple[str, str][source]¶
Dump the panoptic results to a COCO style json file and a directory.
- Parameters
results (Sequence[dict]) – Testing results of the dataset.
outfile_prefix (str) – The filename prefix of the json files and the directory.
- Returns
- The json file and the directory which contains panoptic segmentation masks. The filename of the json is
”somepath/xxx.panoptic.json” and name of the directory is “somepath/xxx.panoptic”.
- Return type
Tuple[str, str]
- class mmdet.evaluation.metrics.CrowdHumanMetric(ann_file: str, metric: Union[str, List[str]] = ['AP', 'MR', 'JI'], format_only: bool = False, outfile_prefix: Optional[str] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None, eval_mode: int = 0, iou_thres: float = 0.5, compare_matching_method: Optional[str] = None, mr_ref: str = 'CALTECH_-2', num_ji_process: int = 10)[source]¶
CrowdHuman evaluation metric.
Evaluate Average Precision (AP), Miss Rate (MR) and Jaccard Index (JI) for detection tasks.
- Parameters
ann_file (str) – Path to the annotation file.
metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘AP’, ‘MR’ and ‘JI’. Defaults to ‘AP’.
format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
eval_mode (int) – Select the mode of evaluate. Valid mode include 0(just body box), 1(just head box) and 2(both of them). Defaults to 0.
iou_thres (float) – IoU threshold. Defaults to 0.5.
compare_matching_method (str, optional) – Matching method to compare the detection results with the ground_truth when compute ‘AP’ and ‘MR’.Valid method include VOC and None(CALTECH). Default to None.
mr_ref (str) – Different parameter selection to calculate MR. Valid ref include CALTECH_-2 and CALTECH_-4. Defaults to CALTECH_-2.
num_ji_process (int) – The number of processes to evaluation JI. Defaults to 10.
- compare(samples)[source]¶
Match the detection results with the ground_truth.
- Parameters
samples (dict[Image]) – The detection result packaged by Image.
- Returns
Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.
- Return type
score_list(list[tuple[ndarray, int, str]])
- compute_ji_matching(dt_boxes, gt_boxes)[source]¶
Match the annotation box for each detection box.
- Parameters
dt_boxes (ndarray) – Detection boxes.
gt_boxes (ndarray) – Ground_truth boxes.
- Returns
Match result.
- Return type
matches_(list[tuple[int, int]])
- compute_ji_with_ignore(result_queue, dt_result, score_thr)[source]¶
Compute JI with ignore.
- Parameters
result_queue (Queue) – The Queue for save compute result when multi_process.
dt_result (dict[Image]) – Detection result packaged by Image.
score_thr (float) – The threshold of detection score.
- Returns
compute result.
- Return type
dict
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
eval_results(Dict[str, float])
- static eval_ap(score_list, gt_num, img_num)[source]¶
Evaluate by average precision.
- Parameters
score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.
gt_num (int) – The number of gt boxes in the entire dataset.
img_num (int) –
- Returns
result of average precision.
- Return type
ap(float)
- eval_ji(samples)[source]¶
Evaluate by JI using multi_process.
- Parameters
samples (Dict[str, Image]) – The detection result packaged by Image.
- Returns
result of jaccard index.
- Return type
ji(float)
- eval_mr(score_list, gt_num, img_num)[source]¶
Evaluate by Caltech-style log-average miss rate.
- Parameters
score_list (list[tuple[ndarray, int, str]]) – Matching result. a list of tuples (dtbox, label, imgID) in the descending sort of dtbox.score.
gt_num (int) – The number of gt boxes in the entire dataset.
img_num (int) – The number of image in the entire dataset.
- Returns
result of miss rate.
- Return type
mr(float)
- load_eval_samples(result_file)[source]¶
Load data from annotations file and detection results.
- Parameters
result_file (str) – The file path of the saved detection results.
- Returns
The detection result packaged by Image
- Return type
Dict[Image]
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- class mmdet.evaluation.metrics.DumpProposals(output_dir: str = '', proposals_file: str = 'proposals.pkl', num_max_proposals: Optional[int] = None, file_client_args: dict = {'backend': 'disk'}, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
Dump proposals pseudo metric.
- Parameters
output_dir (str) – The root directory for
proposals_file
. Defaults to ‘’.proposals_file (str) – Proposals file path. Defaults to ‘proposals.pkl’.
num_max_proposals (int, optional) – Maximum number of proposals to dump. If not specified, all proposals will be dumped.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmengine.fileio.FileClient
for details. Defaults todict(backend='disk')
.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → dict[source]¶
Dump the processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
An empty dict.
- Return type
dict
- process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- class mmdet.evaluation.metrics.LVISMetric(ann_file: Optional[str] = None, metric: Union[str, List[str]] = 'bbox', classwise: bool = False, proposal_nums: Sequence[int] = (100, 300, 1000), iou_thrs: Optional[Union[float, Sequence[float]]] = None, metric_items: Optional[Sequence[str]] = None, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
LVIS evaluation metric.
- Parameters
ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None.
metric (str | List[str]) – Metrics to be evaluated. Valid metrics include ‘bbox’, ‘segm’, ‘proposal’, and ‘proposal_fast’. Defaults to ‘bbox’.
classwise (bool) – Whether to evaluate the metric class-wise. Defaults to False.
proposal_nums (Sequence[int]) – Numbers of proposals to be evaluated. Defaults to (100, 300, 1000).
iou_thrs (float | List[float], optional) – IoU threshold to compute AP and AR. If not specified, IoUs from 0.5 to 0.95 will be used. Defaults to None.
metric_items (List[str], optional) – Metric result names to be recorded in the evaluation result. Defaults to None.
format_only (bool) – Format the output results without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
outfile_prefix (str, optional) – The prefix of json files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If not specified, a temp file will be created. Defaults to None.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
Dict[str, float]
- fast_eval_recall(results: List[dict], proposal_nums: Sequence[int], iou_thrs: Sequence[float], logger: Optional[mmengine.logging.logger.MMLogger] = None) → numpy.ndarray[source]¶
Evaluate proposal recall with LVIS’s fast_eval_recall.
- Parameters
results (List[dict]) – Results of the dataset.
proposal_nums (Sequence[int]) – Proposal numbers used for evaluation.
iou_thrs (Sequence[float]) – IoU thresholds used for evaluation.
logger (MMLogger, optional) – Logger used for logging the recall summary.
- Returns
Averaged recall results.
- Return type
np.ndarray
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- class mmdet.evaluation.metrics.OpenImagesMetric(iou_thrs: Union[float, List[float]] = 0.5, ioa_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, use_group_of: bool = True, get_supercategory: bool = True, filter_labels: bool = True, collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
OpenImages evaluation metric.
Evaluate detection mAP for OpenImages. Please refer to https://storage.googleapis.com/openimages/web/evaluation.html for more details.
- Parameters
iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.
ioa_thrs (float or List[float]) – IoA threshold. Defaults to 0.5.
scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None
use_group_of (bool) – Whether consider group of groud truth bboxes during evaluating. Defaults to True.
get_supercategory (bool) – Whether to get parent class of the current class. Default: True.
filter_labels (bool) – Whether filter unannotated classes. Default: True.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → dict[source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
dict
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
- class mmdet.evaluation.metrics.VOCMetric(iou_thrs: Union[float, List[float]] = 0.5, scale_ranges: Optional[List[tuple]] = None, metric: Union[str, List[str]] = 'mAP', proposal_nums: Sequence[int] = (100, 300, 1000), eval_mode: str = '11points', collect_device: str = 'cpu', prefix: Optional[str] = None)[source]¶
Pascal VOC evaluation metric.
- Parameters
iou_thrs (float or List[float]) – IoU threshold. Defaults to 0.5.
scale_ranges (List[tuple], optional) – Scale ranges for evaluating mAP. If not specified, all bounding boxes would be included in evaluation. Defaults to None.
metric (str | list[str]) –
Metrics to be evaluated. Options are ‘mAP’, ‘recall’. If is list, the first setting in the list will
be used to evaluate metric.
proposal_nums (Sequence[int]) – Proposal number used for evaluating recalls, such as recall@100, recall@1000. Default: (100, 300, 1000).
eval_mode (str) – ‘area’ or ‘11points’, ‘area’ means calculating the area under precision-recall curve, ‘11points’ means calculating the average precision of recalls at [0, 0.1, …, 1]. The PASCAL VOC2007 defaults to use ‘11points’, while PASCAL VOC2012 defaults to use ‘area’.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → dict[source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
- Return type
dict
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data samples and predictions. The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of data samples that contain annotations and predictions.
mmdet.models¶
backbones¶
- class mmdet.models.backbones.CSPDarknet(arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), frozen_stages=- 1, use_depthwise=False, arch_ovewrite=None, spp_kernal_sizes=(5, 9, 13), conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, init_cfg={'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶
CSP-Darknet backbone used in YOLOv5 and YOLOX.
- Parameters
arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Default: P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Default: 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Default: False.
arch_ovewrite (list) – Overwrite default arch settings. Default: None.
spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Default: (5, 9, 13).
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmdet.models import CSPDarknet >>> import torch >>> self = CSPDarknet(depth=53) >>> self.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmdet.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, spp_kernel_sizes: Sequence[int] = (5, 9, 13), channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶
CSPNeXt backbone used in RTMDet.
- Parameters
arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
spp_kernel_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).
channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.
conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’).norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
:param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
]): Initialization config dict.- forward(x: Tuple[torch.Tensor, ...]) → Tuple[torch.Tensor, ...][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True) → None[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmdet.models.backbones.Darknet(depth=53, out_indices=(3, 4, 5), frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'}, norm_eval=True, pretrained=None, init_cfg=None)[source]¶
Darknet backbone.
- Parameters
depth (int) – Depth of Darknet. Currently only support 53.
out_indices (Sequence[int]) – Output from which stages.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
Example
>>> from mmdet.models import Darknet >>> import torch >>> self = Darknet(depth=53) >>> self.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- static make_conv_res_block(in_channels, out_channels, res_repeat, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'negative_slope': 0.1, 'type': 'LeakyReLU'})[source]¶
In Darknet backbone, ConvLayer is usually followed by ResBlock. This function will make that. The Conv layers always have 3x3 filters with stride=2. The number of the filters in Conv layer is the same as the out channels of the ResBlock.
- Parameters
in_channels (int) – The number of input channels.
out_channels (int) – The number of output channels.
res_repeat (int) – The number of ResBlocks.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmdet.models.backbones.DetectoRS_ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
ResNeXt backbone for DetectoRS.
- Parameters
groups (int) – The number of groups in ResNeXt.
base_width (int) – The base width of ResNeXt.
- class mmdet.models.backbones.DetectoRS_ResNet(sac=None, stage_with_sac=(False, False, False, False), rfp_inplanes=None, output_img=False, pretrained=None, init_cfg=None, **kwargs)[source]¶
ResNet backbone for DetectoRS.
- Parameters
sac (dict, optional) – Dictionary to construct SAC (Switchable Atrous Convolution). Default: None.
stage_with_sac (list) – Which stage to use sac. Default: (False, False, False, False).
rfp_inplanes (int, optional) – The number of channels from RFP. Default: None. If specified, an additional conv layer will be added for
rfp_feat
. Otherwise, the structure is the same as base class.output_img (bool) – If
True
, the input image will be inserted into the starting position of output. Default: False.
- class mmdet.models.backbones.EfficientNet(arch='b0', drop_path_rate=0.0, out_indices=(6), frozen_stages=0, conv_cfg={'type': 'Conv2dAdaptivePadding'}, norm_cfg={'eps': 0.001, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': 'Conv2d'}, {'type': 'Constant', 'layer': ['_BatchNorm', 'GroupNorm'], 'val': 1}])[source]¶
EfficientNet backbone.
- Parameters
arch (str) – Architecture of efficientnet. Defaults to b0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (6, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Defaults to 0, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Defaults to False.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmdet.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=True, with_cp=False, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]¶
HRNet backbone.
High-Resolution Representations for Labeling Pixels and Regions arXiv:.
- Parameters
extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
num_modules(int): The number of HRModule in this stage.
num_branches(int): The number of branches in the HRModule.
block(str): The type of convolution block.
- num_blocks(tuple): The number of blocks in each branch.
The length must be equal to num_branches.
- num_channels(tuple): The number of channels in each branch.
The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – Dictionary to construct and config conv layer.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: True.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmdet.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8) (1, 64, 4, 4) (1, 128, 2, 2) (1, 256, 1, 1)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmdet.models.backbones.HourglassNet(downsample_times: int = 5, num_stacks: int = 2, stage_channels: Sequence = (256, 256, 384, 384, 384, 512), stage_blocks: Sequence = (2, 2, 2, 2, 2, 4), feat_channel: int = 256, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'requires_grad': True, 'type': 'BN'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
HourglassNet backbone.
Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .
- Parameters
downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (Sequence[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (Sequence[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channel (int) – Feature channel of conv after a HourglassModule.
norm_cfg – Dictionary to construct and config norm layer.
Example
>>> from mmdet.models import HourglassNet >>> import torch >>> self = HourglassNet() >>> self.eval() >>> inputs = torch.rand(1, 3, 511, 511) >>> level_outputs = self.forward(inputs) >>> for level_output in level_outputs: ... print(tuple(level_output.shape)) (1, 256, 128, 128) (1, 256, 128, 128)
- class mmdet.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(1, 2, 4, 7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
MobileNetV2 backbone.
- Parameters
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int], optional) – Output from which stages. Default: (1, 2, 4, 7).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- make_layer(out_channels, num_blocks, stride, expand_ratio)[source]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- Parameters
out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.
- class mmdet.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, convert_weights=True, init_cfg=None)[source]¶
Pyramid Vision Transformer (PVT)
Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 64.
num_stags (int) – The num of stages. Default: 4.
num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].
patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].
strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].
paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.
use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmdet.models.backbones.PyramidVisionTransformerV2(**kwargs)[source]¶
Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.
- class mmdet.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶
RegNet backbone.
More details can be found in paper .
- Parameters
arch (dict) –
The parameter of RegNets.
w0 (int): initial width
wa (float): slope of width
wm (float): quantization parameter to quantize the width
depth (int): depth of the backbone
group_w (int): width of group
bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
Example
>>> from mmdet.models import RegNet >>> import torch >>> self = RegNet( arch=dict( w0=88, wa=26.31, wm=2.25, group_w=48, depth=25, bot_mul=1.0)) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 96, 8, 8) (1, 192, 4, 4) (1, 432, 2, 2) (1, 1008, 1, 1)
- adjust_width_group(widths, bottleneck_ratio, groups)[source]¶
Adjusts the compatibility of widths and groups.
- Parameters
widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage
- Returns
The adjusted widths and groups of each stage.
- Return type
tuple(list)
- generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[source]¶
Generates per block width from RegNet parameters.
- Parameters
initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int, optional) – The divisor of channels. Defaults to 8.
- Returns
return a list of widths of each stage and the number of stages
- Return type
list, int
- class mmdet.models.backbones.Res2Net(scales=4, base_width=26, style='pytorch', deep_stem=True, avg_down=True, pretrained=None, init_cfg=None, **kwargs)[source]¶
Res2Net backbone.
- Parameters
scales (int) – Scales used in Res2Net. Default: 4
base_width (int) – Basic width of each scale. Default: 26
depth (int) – Depth of res2net, from {50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Res2net stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottle2neck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
Example
>>> from mmdet.models import Res2Net >>> import torch >>> self = Res2Net(depth=50, scales=4, base_width=26) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
- class mmdet.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]¶
ResNeSt backbone.
- Parameters
groups (int) – Number of groups of Bottleneck. Default: 1
base_width (int) – Base width of Bottleneck. Default: 4
radix (int) – Radix of SplitAttentionConv2d. Default: 2
reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
kwargs (dict) – Keyword arguments for ResNet.
- class mmdet.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
ResNeXt backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
- class mmdet.models.backbones.ResNet(depth, in_channels=3, stem_channels=None, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=True, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶
ResNet backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
stem_channels (int | None) – Number of stem channels. If not specified, it will be the same as base_channels. Default: None.
base_channels (int) – Number of base channels of res layer. Default: 64.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
position (str, required): Position inside block to insert plugin, options are ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
Example
>>> from mmdet.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- make_stage_plugins(plugins, stage_idx)[source]¶
Make plugins for ResNet
stage_idx
th stage.Currently we support to insert
context_block
,empirical_attention_block
,nonlocal_block
into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.An example of plugins format could be:
Examples
>>> plugins=[ ... dict(cfg=dict(type='xxx', arg1='xxx'), ... stages=(False, True, True, True), ... position='after_conv2'), ... dict(cfg=dict(type='yyy'), ... stages=(True, True, True, True), ... position='after_conv3'), ... dict(cfg=dict(type='zzz', postfix='1'), ... stages=(True, True, True, True), ... position='after_conv3'), ... dict(cfg=dict(type='zzz', postfix='2'), ... stages=(True, True, True, True), ... position='after_conv3') ... ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3
Suppose
stage_idx=0
, the structure of blocks in the stage would be:conv1-> conv2->conv3->yyy->zzz1->zzz2
Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:
conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2
If stages is missing, the plugin would be applied to all stages.
- Parameters
plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build
- Returns
Plugins for current stage
- Return type
list[dict]
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmdet.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d variant described in Bag of Tricks.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmdet.models.backbones.SSDVGG(depth, with_last_pool=False, ceil_mode=True, out_indices=(3, 4), out_feature_indices=(22, 34), pretrained=None, init_cfg=None, input_size=None, l2_norm_scale=None)[source]¶
VGG Backbone network for single-shot-detection.
- Parameters
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_last_pool (bool) – Whether to add a pooling layer at the last of the model
ceil_mode (bool) – When True, will use ceil instead of floor to compute the output shape.
out_indices (Sequence[int]) – Output from which stages.
out_feature_indices (Sequence[int]) – Output from which feature map.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
input_size (int, optional) – Deprecated argumment. Width and height of input, from {300, 512}.
l2_norm_scale (float, optional) – Deprecated argumment. L2 normalization layer init scale.
Example
>>> self = SSDVGG(input_size=300, depth=11) >>> self.eval() >>> inputs = torch.rand(1, 3, 300, 300) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 1024, 19, 19) (1, 512, 10, 10) (1, 256, 5, 5) (1, 256, 3, 3) (1, 256, 1, 1)
- class mmdet.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, convert_weights=False, frozen_stages=- 1, init_cfg=None)[source]¶
Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -
Inspiration from https://github.com/microsoft/Swin-Transformer
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmdet.models.backbones.TridentResNet(depth, num_branch, test_branch_idx, trident_dilations, **kwargs)[source]¶
The stem layer, stage 1 and stage 2 in Trident ResNet are identical to ResNet, while in stage 3, Trident BottleBlock is utilized to replace the normal BottleBlock to yield trident output. Different branch shares the convolution weight but uses different dilations to achieve multi-scale output.
/ stage3(b0) x - stem - stage1 - stage2 - stage3(b1) - output stage3(b2) /
- Parameters
depth (int) – Depth of resnet, from {50, 101, 152}.
num_branch (int) – Number of branches in TridentNet.
test_branch_idx (int) – In inference, all 3 branches will be used if test_branch_idx==-1, otherwise only branch with index test_branch_idx will be used.
trident_dilations (tuple[int]) – Dilations of different trident branch. len(trident_dilations) should be equal to num_branch.
data_preprocessors¶
- class mmdet.models.data_preprocessors.BatchFixedSizePad(size: Tuple[int, int], img_pad_value: int = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255)[source]¶
Fixed size padding for batch images.
- Parameters
size (Tuple[int, int]) – Fixed padding size. Expected padding shape (h, w). Defaults to None.
img_pad_value (int) – The padded pixel value for images. Defaults to 0.
pad_mask (bool) – Whether to pad instance masks. Defaults to False.
mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.
pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.
seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.
- class mmdet.models.data_preprocessors.BatchResize(scale: tuple, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0)[source]¶
Batch resize during training. This implementation is modified from https://github.com/Purkialo/CrowdDet/blob/master/lib/data/CrowdHuman.py.
It provides the data pre-processing as follows: - A batch of all images will pad to a uniform size and stack them into
a torch.Tensor by DetDataPreprocessor.
BatchFixShapeResize resize all images to the target size.
Padding images to make sure the size of image can be divisible by
pad_size_divisor
.
- Parameters
scale (tuple) – Images scales for resizing.
pad_size_divisor (int) – Image size divisible factor. Defaults to 1.
pad_value (Number) – The padded pixel value. Defaults to 0.
- forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample]) → Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][source]¶
resize a batch of images and bboxes.
- class mmdet.models.data_preprocessors.BatchSyncRandomResize(random_size_range: Tuple[int, int], interval: int = 10, size_divisor: int = 32)[source]¶
Batch random resize which synchronizes the random size across ranks.
- Parameters
random_size_range (tuple) – The multi-scale random range during multi-scale training.
interval (int) – The iter interval of change image size. Defaults to 10.
size_divisor (int) – Image size divisible factor. Defaults to 32.
- forward(inputs: torch.Tensor, data_samples: List[mmdet.structures.det_data_sample.DetDataSample]) → Tuple[torch.Tensor, List[mmdet.structures.det_data_sample.DetDataSample]][source]¶
resize a batch of images and bboxes to shape
self._input_size
- class mmdet.models.data_preprocessors.BoxInstDataPreprocessor(*arg, mask_stride: int = 4, pairwise_size: int = 3, pairwise_dilation: int = 2, pairwise_color_thresh: float = 0.3, bottom_pixels_removed: int = 10, **kwargs)[source]¶
Pseudo mask pre-processor for BoxInst.
Comparing with the
mmdet.DetDataPreprocessor
,It generates masks using box annotations.
It computes the images color similarity in LAB color space.
- Parameters
mask_stride (int) – The mask output stride in boxinst. Defaults to 4.
pairwise_size (int) – The size of neighborhood for each pixel. Defaults to 3.
pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.
pairwise_color_thresh (float) – The thresh of image color similarity. Defaults to 0.3.
bottom_pixels_removed (int) – The length of removed pixels in bottom. It is caused by the annotation error in coco dataset. Defaults to 10.
- class mmdet.models.data_preprocessors.DetDataPreprocessor(mean: Optional[Sequence[numbers.Number]] = None, std: Optional[Sequence[numbers.Number]] = None, pad_size_divisor: int = 1, pad_value: Union[float, int] = 0, pad_mask: bool = False, mask_pad_value: int = 0, pad_seg: bool = False, seg_pad_value: int = 255, bgr_to_rgb: bool = False, rgb_to_bgr: bool = False, boxtype2tensor: bool = True, batch_augments: Optional[List[dict]] = None)[source]¶
Image pre-processor for detection tasks.
Comparing with the
mmengine.ImgDataPreprocessor
,It supports batch augmentations.
2. It will additionally append batch_input_shape and pad_shape to data_samples considering the object detection task.
It provides the data pre-processing as follows
Collate and move data to the target device.
Pad inputs to the maximum size of current batch with defined
pad_value
. The padding size can be divisible by a definedpad_size_divisor
Stack inputs to batch_inputs.
Convert inputs from bgr to rgb if the shape of input is (3, H, W).
Normalize image with defined std and mean.
Do batch augmentations during training.
- Parameters
mean (Sequence[Number], optional) – The pixel mean of R, G, B channels. Defaults to None.
std (Sequence[Number], optional) – The pixel standard deviation of R, G, B channels. Defaults to None.
pad_size_divisor (int) – The size of padded image should be divisible by
pad_size_divisor
. Defaults to 1.pad_value (Number) – The padded pixel value. Defaults to 0.
pad_mask (bool) – Whether to pad instance masks. Defaults to False.
mask_pad_value (int) – The padded pixel value for instance masks. Defaults to 0.
pad_seg (bool) – Whether to pad semantic segmentation maps. Defaults to False.
seg_pad_value (int) – The padded pixel value for semantic segmentation maps. Defaults to 255.
bgr_to_rgb (bool) – whether to convert image from BGR to RGB. Defaults to False.
rgb_to_bgr (bool) – whether to convert image from RGB to RGB. Defaults to False.
boxtype2tensor (bool) – Whether to keep the
BaseBoxes
type of bboxes data or not. Defaults to False.batch_augments (list[dict], optional) – Batch-level augmentations
- forward(data: dict, training: bool = False) → dict[source]¶
Perform normalization、padding and bgr2rgb conversion based on
BaseDataPreprocessor
.- Parameters
data (dict) – Data sampled from dataloader.
training (bool) – Whether to enable training time augmentation.
- Returns
Data in the same format as the model input.
- Return type
dict
- pad_gt_masks(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample]) → None[source]¶
Pad gt_masks to shape of batch_input_shape.
- pad_gt_sem_seg(batch_data_samples: Sequence[mmdet.structures.det_data_sample.DetDataSample]) → None[source]¶
Pad gt_sem_seg to shape of batch_input_shape.
- class mmdet.models.data_preprocessors.MultiBranchDataPreprocessor(data_preprocessor: Union[mmengine.config.config.ConfigDict, dict])[source]¶
DataPreprocessor wrapper for multi-branch data.
Take semi-supervised object detection as an example, assume that the ratio of labeled data and unlabeled data in a batch is 1:2, sup indicates the branch where the labeled data is augmented, unsup_teacher and unsup_student indicate the branches where the unlabeled data is augmented by different pipeline.
The input format of multi-branch data is shown as below :
The format of multi-branch data after filtering None is shown as below :
In order to reuse DetDataPreprocessor for the data from different branches, the format of multi-branch data grouped by branch is as below :
After preprocessing data from different branches, the multi-branch data needs to be reformatted as:
- Parameters
data_preprocessor (
ConfigDict
or dict) – Config ofDetDataPreprocessor
to process the input data.
- cpu(*args, **kwargs) → torch.nn.modules.module.Module[source]¶
Overrides this method to set the
device
- Returns
The model itself.
- Return type
nn.Module
- cuda(*args, **kwargs) → torch.nn.modules.module.Module[source]¶
Overrides this method to set the
device
- Returns
The model itself.
- Return type
nn.Module
- forward(data: dict, training: bool = False) → dict[source]¶
Perform normalization、padding and bgr2rgb conversion based on
BaseDataPreprocessor
for multi-branch data.- Parameters
data (dict) – Data sampled from dataloader.
training (bool) – Whether to enable training time augmentation.
- Returns
- ‘inputs’ (Dict[str, obj:torch.Tensor]): The forward data of
models from different branches.
- ’data_sample’ (Dict[str, obj:DetDataSample]): The annotation
info of the sample from different branches.
- Return type
dict
- to(device: Optional[Union[int, torch.device]], *args, **kwargs) → torch.nn.modules.module.Module[source]¶
Overrides this method to set the
device
- Parameters
device (int or torch.device, optional) – The desired device of the parameters and buffers in this module.
- Returns
The model itself.
- Return type
nn.Module
dense_heads¶
- class mmdet.models.dense_heads.ATSSHead(num_classes: int, in_channels: int, pred_kernel_size: int = 3, stacked_convs: int = 4, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'num_groups': 32, 'requires_grad': True, 'type': 'GN'}, reg_decoded_bbox: bool = True, loss_centerness: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'atss_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'}, **kwargs)[source]¶
Detection Head of ATSS.
ATSS head structure is similar with FCOS, however ATSS use anchor boxes and assign label by Adaptive Training Sample Selection instead max-iou.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
pred_kernel_size (int) – Kernel size of
nn.Conv2d
stacked_convs (int) – Number of stacking convs of the head.
conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults todict(type='GN', num_groups=32, requires_grad=True)
.reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Defaults to False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.
loss_centerness (
ConfigDict
or dict) – Config of centerness loss. Defaults todict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)
.
:param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
]): Initialization config dict.- centerness_target(anchors: torch.Tensor, gts: torch.Tensor) → torch.Tensor[source]¶
Calculate the centerness between anchors and gts.
Only calculate pos centerness targets, otherwise there may be nan.
- Parameters
anchors (Tensor) – Anchors with shape (N, 4), “xyxy” format.
gts (Tensor) – Ground truth bboxes with shape (N, 4), “xyxy” format.
- Returns
Centerness between anchors and gts.
- Return type
Tensor
- forward(x: Tuple[torch.Tensor]) → Tuple[List[torch.Tensor]][source]¶
Forward features from the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- Usually a tuple of classification scores and bbox prediction
- cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_anchors * num_classes.
- bbox_preds (list[Tensor]): Box energies / deltas for all scale
levels, each is a 4D-tensor, the channels number is num_anchors * 4.
- Return type
tuple
- forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale) → Sequence[torch.Tensor][source]¶
Forward feature of a single scale level.
- Parameters
x (Tensor) – Features of a single scale level.
( (scale) – obj: mmcv.cnn.Scale): Learnable scale module to resize the bbox prediction.
- Returns
- cls_score (Tensor): Cls scores for a single scale level
the channels number is num_anchors * num_classes.
- bbox_pred (Tensor): Box energies / deltas for a single scale
level, the channels number is num_anchors * 4.
- centerness (Tensor): Centerness for a single scale level, the
channel number is (N, num_anchors * 1, H, W).
- Return type
tuple
- get_num_level_anchors_inside(num_level_anchors, inside_flags)[source]¶
Get the number of valid anchors in every level.
- get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True) → tuple[source]¶
Get targets for ATSS head.
This method is almost the same as AnchorHead.get_targets(). Besides returning the targets as the parent method does, it also returns the anchors as the first element of the returned tuple.
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], centernesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W)
centernesses (list[Tensor]) – Centerness for each scale level with shape (N, num_anchors * 1, H, W)
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], Optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- loss_by_feat_single(anchors: torch.Tensor, cls_score: torch.Tensor, bbox_pred: torch.Tensor, centerness: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, avg_factor: float) → dict[source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).
avg_factor (float) – Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet.models.dense_heads.AnchorFreeHead(num_classes: int, in_channels: int, feat_channels: int = 256, stacked_convs: int = 4, strides: Union[Sequence[int], Sequence[Tuple[int, int]]] = (4, 8, 16, 32, 64), dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.25, 'gamma': 2.0, 'loss_weight': 1.0, 'type': 'FocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'IoULoss'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]] = {'layer': 'Conv2d', 'override': {'bias_prob': 0.01, 'name': 'conv_cls', 'std': 0.01, 'type': 'Normal'}, 'std': 0.01, 'type': 'Normal'})[source]¶
Anchor-free head (FCOS, Fovea, RepPoints, etc.).
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
feat_channels (int) – Number of hidden channels. Used in child classes.
stacked_convs (int) – Number of stacking convs of the head.
strides (Sequence[int] or Sequence[Tuple[int, int]]) – Downsample factor of each feature map.
dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.
conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Default: “auto”.
loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.bbox_coder (
ConfigDict
or dict) – Config of bbox coder. Defaults ‘DistancePointBBoxCoder’.conv_cfg (
ConfigDict
or dict, Optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict, Optional) – Config dict for normalization layer. Defaults to None.train_cfg (
ConfigDict
or dict, Optional) – Training config of anchor-free head.test_cfg (
ConfigDict
or dict, Optional) – Testing config of anchor-free head.init_cfg (
ConfigDict
or dict or list[ConfigDict
or dict]) – Initialization config dict.
- aug_test(aug_batch_feats: List[torch.Tensor], aug_batch_img_metas: List[List[torch.Tensor]], rescale: bool = False) → List[numpy.ndarray][source]¶
Test function with test time augmentation.
- Parameters
aug_batch_feats (list[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains features for all images in the batch.
aug_batch_img_metas (list[list[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch. each dict has image information.
rescale (bool, optional) – Whether to rescale the results. Defaults to False.
- Returns
bbox results of each class
- Return type
list[ndarray]
- forward(x: Tuple[torch.Tensor]) → Tuple[List[torch.Tensor], List[torch.Tensor]][source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
Usually contain classification scores and bbox predictions.
cls_scores (list[Tensor]): Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
- Return type
tuple
- forward_single(x: torch.Tensor) → Tuple[torch.Tensor, ...][source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – FPN feature maps of the specified stride.
- Returns
Scores for each class, bbox predictions, features after classification and regression conv layers, some models needs these features like FCOS.
- Return type
tuple
- abstract get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData]) → Any[source]¶
Compute regression, classification and centerness targets for points in multiple images.
- Parameters
points (list[Tensor]) – Points of each fpn level, each has shape (num_points, 2).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.
- abstract loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], Optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- class mmdet.models.dense_heads.AnchorHead(num_classes: int, in_channels: int, feat_channels: int = 256, anchor_generator: Union[mmengine.config.config.ConfigDict, dict] = {'ratios': [0.5, 1.0, 2.0], 'scales': [8, 16, 32], 'strides': [4, 8, 16, 32, 64], 'type': 'AnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'clip_border': True, 'target_means': (0.0, 0.0, 0.0, 0.0), 'target_stds': (1.0, 1.0, 1.0, 1.0), 'type': 'DeltaXYWHBBoxCoder'}, reg_decoded_bbox: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 0.1111111111111111, 'loss_weight': 1.0, 'type': 'SmoothL1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'layer': 'Conv2d', 'std': 0.01, 'type': 'Normal'})[source]¶
Anchor-based head (RPN, RetinaNet, SSD, etc.).
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
feat_channels (int) – Number of hidden channels. Used in child classes.
anchor_generator (dict) – Config dict for anchor generator
bbox_coder (dict) – Config of bounding box coder.
reg_decoded_bbox (bool) – If true, the regression loss would be applied directly on decoded bounding boxes, converting both the predicted boxes and regression targets to absolute coordinates format. Default False. It should be True when using IoULoss, GIoULoss, or DIoULoss in the bbox head.
loss_cls (dict) – Config of classification loss.
loss_bbox (dict) – Config of localization loss.
train_cfg (dict) – Training config of anchor head.
test_cfg (dict) – Testing config of anchor head.
init_cfg (dict or list[dict], optional) – Initialization config dict.
- forward(x: Tuple[torch.Tensor]) → Tuple[List[torch.Tensor]][source]¶
Forward features from the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of classification scores and bbox prediction.
cls_scores (list[Tensor]): Classification scores for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
- Return type
tuple
- forward_single(x: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶
Forward feature of a single scale level.
- Parameters
x (Tensor) – Features of a single scale level.
- Returns
cls_score (Tensor): Cls scores for a single scale level the channels number is num_base_priors * num_classes. bbox_pred (Tensor): Box energies / deltas for a single scale level, the channels number is num_base_priors * 4.
- Return type
tuple
- get_anchors(featmap_sizes: List[tuple], batch_img_metas: List[dict], device: Union[torch.device, str] = 'cuda') → Tuple[List[List[torch.Tensor]], List[List[torch.Tensor]]][source]¶
Get anchors according to feature map sizes.
- Parameters
featmap_sizes (list[tuple]) – Multi-level feature map sizes.
batch_img_metas (list[dict]) – Image meta info.
device (torch.device | str) – Device for returned tensors. Defaults to cuda.
- Returns
anchor_list (list[list[Tensor]]): Anchors of each image.
valid_flag_list (list[list[Tensor]]): Valid flags of each image.
- Return type
tuple
- get_targets(anchor_list: List[List[torch.Tensor]], valid_flag_list: List[List[torch.Tensor]], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None, unmap_outputs: bool = True, return_sampling_results: bool = False) → tuple[source]¶
Compute regression and classification targets for anchors in multiple images.
- Parameters
anchor_list (list[list[Tensor]]) – Multi level anchors of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, 4).
valid_flag_list (list[list[Tensor]]) – Multi level valid flags of each image. The outer list indicates images, and the inner list corresponds to feature levels of the image. Each element of the inner list is a tensor of shape (num_anchors, )
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.unmap_outputs (bool) – Whether to map outputs back to the original set of anchors. Defaults to True.
return_sampling_results (bool) – Whether to return the sampling results. Defaults to False.
- Returns
Usually returns a tuple containing learning targets.
labels_list (list[Tensor]): Labels of each level.
label_weights_list (list[Tensor]): Label weights of each level.
bbox_targets_list (list[Tensor]): BBox targets of each level.
bbox_weights_list (list[Tensor]): BBox weights of each level.
avg_factor (int): Average factor that is used to average the loss. When using sampling method, avg_factor is usually the sum of positive and negative priors. When using PseudoSampler, avg_factor is usually equal to the number of positive priors.
- additional_returns: This function enables user-defined returns from
self._get_targets_single. These returns are currently refined to properties at each feature map (i.e. having HxW dimension). The results will be concatenated after the end
- Return type
tuple
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level has shape (N, num_anchors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict
- loss_by_feat_single(cls_score: torch.Tensor, bbox_pred: torch.Tensor, anchors: torch.Tensor, labels: torch.Tensor, label_weights: torch.Tensor, bbox_targets: torch.Tensor, bbox_weights: torch.Tensor, avg_factor: int) → tuple[source]¶
Calculate the loss of a single scale level based on the features extracted by the detection head.
- Parameters
cls_score (Tensor) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W).
bbox_pred (Tensor) – Box energies / deltas for each scale level with shape (N, num_anchors * 4, H, W).
anchors (Tensor) – Box reference for each scale level with shape (N, num_total_anchors, 4).
labels (Tensor) – Labels of each anchors with shape (N, num_total_anchors).
label_weights (Tensor) – Label weights of each anchor with shape (N, num_total_anchors)
bbox_targets (Tensor) – BBox regression targets of each anchor weight shape (N, num_total_anchors, 4).
bbox_weights (Tensor) – BBox regression loss weights of each anchor with shape (N, num_total_anchors, 4).
avg_factor (int) – Average factor that is used to average the loss.
- Returns
loss components.
- Return type
tuple
- class mmdet.models.dense_heads.AutoAssignHead(*args, force_topk: bool = False, topk: int = 9, pos_loss_weight: float = 0.25, neg_loss_weight: float = 0.75, center_loss_weight: float = 0.75, **kwargs)[source]¶
AutoAssignHead head used in AutoAssign.
More details can be found in the paper .
- Parameters
force_topk (bool) – Used in center prior initialization to handle extremely small gt. Default is False.
topk (int) – The number of points used to calculate the center prior when no point falls in gt_bbox. Only work when force_topk if True. Defaults to 9.
pos_loss_weight (float) – The loss weight of positive loss and with default value 0.25.
neg_loss_weight (float) – The loss weight of negative loss and with default value 0.75.
center_loss_weight (float) – The loss weight of center prior loss and with default value 0.75.
- forward_single(x: torch.Tensor, scale: mmcv.cnn.bricks.scale.Scale, stride: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶
Forward features of a single scale level.
- Parameters
x (Tensor) – FPN feature maps of the specified stride.
scale (
mmcv.cnn.Scale
) – Learnable scale module to resize the bbox prediction.stride (int) – The corresponding stride for feature maps, only used to normalize the bbox prediction when self.norm_on_bbox is True.
- Returns
scores for each class, bbox predictions and centerness predictions of input feature maps.
- Return type
tuple[Tensor, Tensor, Tensor]
- get_neg_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, ious: torch.Tensor, inside_gt_bbox_mask: torch.Tensor) → Tuple[torch.Tensor][source]¶
Calculate the negative loss of all points in feature map.
- Parameters
cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).
objectness (Tensor) – Foreground probability of all points and is shape of (num_points, 1).
gt_instances (
InstanceData
) – Ground truth of instance annotations. It should includesbboxes
andlabels
attributes.ious (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represent the iou of pred_bbox and gt_bboxes.
inside_gt_bbox_mask (Tensor) – Tensor of bool type, with shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.
- Returns
neg_loss (Tensor): The negative loss of all points in the feature map.
- Return type
tuple[Tensor]
- get_pos_loss_single(cls_score: torch.Tensor, objectness: torch.Tensor, reg_loss: torch.Tensor, gt_instances: mmengine.structures.instance_data.InstanceData, center_prior_weights: torch.Tensor) → Tuple[torch.Tensor][source]¶
Calculate the positive loss of all points in gt_bboxes.
- Parameters
cls_score (Tensor) – All category scores for each point on the feature map. The shape is (num_points, num_class).
objectness (Tensor) – Foreground probability of all points, has shape (num_points, 1).
reg_loss (Tensor) – The regression loss of each gt_bbox and each prediction box, has shape of (num_points, num_gt).
gt_instances (
InstanceData
) – Ground truth of instance annotations. It should includesbboxes
andlabels
attributes.center_prior_weights (Tensor) – Float tensor with shape of (num_points, num_gt). Each value represents the center weighting coefficient.
- Returns
pos_loss (Tensor): The positive loss of all points in the gt_bboxes.
- Return type
tuple[Tensor]
- get_targets(points: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData]) → Tuple[List[torch.Tensor], List[torch.Tensor]][source]¶
Compute regression targets and each point inside or outside gt_bbox in multiple images.
- Parameters
points (list[Tensor]) – Points of all fpn level, each has shape (num_points, 2).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.
- Returns
inside_gt_bbox_mask_list (list[Tensor]): Each Tensor is with bool type and shape of (num_points, num_gt), each value is used to mark whether this point falls within a certain gt.
concat_lvl_bbox_targets (list[Tensor]): BBox targets of each level. Each tensor has shape (num_points, num_gt, 4).
- Return type
tuple(list[Tensor], list[Tensor])
- init_weights() → None[source]¶
Initialize weights of the head.
In particular, we have special initialization for classified conv’s and regression conv’s bias
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → Dict[str, torch.Tensor][source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_points * num_classes.
bbox_preds (list[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_points * 4.
objectnesses (list[Tensor]) – objectness for each scale level, each is a 4D-tensor, the channel number is num_points * 1.
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet.models.dense_heads.BoxInstBboxHead(*args, **kwargs)[source]¶
BoxInst box head used in https://arxiv.org/abs/2012.02310.
- class mmdet.models.dense_heads.BoxInstMaskHead(*arg, pairwise_size: int = 3, pairwise_dilation: int = 2, warmup_iters: int = 10000, **kwargs)[source]¶
BoxInst mask head used in https://arxiv.org/abs/2012.02310.
This head outputs the mask for BoxInst.
- Parameters
pairwise_size (dict) – The size of neighborhood for each pixel. Defaults to 3.
pairwise_dilation (int) – The dilation of neighborhood for each pixel. Defaults to 2.
warmup_iters (int) – Warmup iterations for pair-wise loss. Defaults to 10000.
- get_pairwise_affinity(mask_logits: torch.Tensor) → torch.Tensor[source]¶
Compute the pairwise affinity for each pixel.
- loss_by_feat(mask_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], positive_infos: List[mmengine.structures.instance_data.InstanceData], **kwargs) → dict[source]¶
Calculate the loss based on the features extracted by the mask head.
- Parameters
mask_preds (list[Tensor]) – List of predicted masks, each has shape (num_classes, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
,masks
, andlabels
attributes.batch_img_metas (list[dict]) – Meta information of multiple images.
positive_infos (List[:obj:
InstanceData
]) – Information of positive samples of each image that are assigned in detection head.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmdet.models.dense_heads.CascadeRPNHead(num_classes: int, num_stages: int, stages: List[Union[dict, mmengine.config.config.ConfigDict]], train_cfg: List[Union[dict, mmengine.config.config.ConfigDict]], test_cfg: Union[mmengine.config.config.ConfigDict, dict], init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
The CascadeRPNHead will predict more accurate region proposals, which is required for two-stage detectors (such as Fast/Faster R-CNN). CascadeRPN consists of a sequence of RPNStage to progressively improve the accuracy of the detected proposals.
More details can be found in
https://arxiv.org/abs/1909.06720
.- Parameters
num_stages (int) – number of CascadeRPN stages.
stages (list[
ConfigDict
or dict]) – list of configs to build the stages.train_cfg (list[
ConfigDict
or dict]) – list of configs at training time each stage.test_cfg (
ConfigDict
or dict) – config at testing time.init_cfg (
ConfigDict
or list[ConfigDict
] or dict or list[dict]) – Initialization config dict.
- loss(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample]) → dict[source]¶
Perform forward propagation and loss calculation of the detection head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
DetDataSample
]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.
- Returns
A dictionary of loss components.
- Return type
dict
- loss_and_predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], proposal_cfg: Optional[mmengine.config.config.ConfigDict] = None) → Tuple[dict, List[mmengine.structures.instance_data.InstanceData]][source]¶
Perform forward propagation of the head, then calculate loss and predictions from the features and data samples.
- Parameters
x (tuple[Tensor]) – Features from FPN.
batch_data_samples (list[
DetDataSample
]) – Each item contains the meta information of each image and corresponding annotations.proposal_cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
- Returns
the return value is a tuple contains:
losses: (dict[str, Tensor]): A dictionary of loss components.
predictions (list[
InstanceData
]): Detection results of each image after the post process.
- Return type
tuple
- predict(x: Tuple[torch.Tensor], batch_data_samples: List[mmdet.structures.det_data_sample.DetDataSample], rescale: bool = False) → List[mmengine.structures.instance_data.InstanceData][source]¶
Perform forward propagation of the detection head and predict detection results on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
DetDataSample
]) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.rescale (bool, optional) – Whether to rescale the results. Defaults to False.
- Returns
InstanceData]: Detection results of each image after the post process.
- Return type
list[obj
- class mmdet.models.dense_heads.CenterNetHead(in_channels: int, feat_channels: int, num_classes: int, loss_center_heatmap: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'GaussianFocalLoss'}, loss_wh: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.1, 'type': 'L1Loss'}, loss_offset: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'type': 'L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Objects as Points Head. CenterHead use center_point to indicate object’s position. Paper link <https://arxiv.org/abs/1904.07850>
- Parameters
in_channels (int) – Number of channel in the input feature map.
feat_channels (int) – Number of channel in the intermediate feature map.
num_classes (int) – Number of categories excluding the background category.
loss_center_heatmap (
ConfigDict
or dict) – Config of center heatmap loss. Defaults to dict(type=’GaussianFocalLoss’, loss_weight=1.0)loss_wh (
ConfigDict
or dict) – Config of wh loss. Defaults to dict(type=’L1Loss’, loss_weight=0.1).loss_offset (
ConfigDict
or dict) – Config of offset loss. Defaults to dict(type=’L1Loss’, loss_weight=1.0).train_cfg (
ConfigDict
or dict, optional) – Training config. Useless in CenterNet, but we keep this variable for SingleStageDetector.test_cfg (
ConfigDict
or dict, optional) – Testing config of CenterNet.
- :param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
], optional): Initialization config dict.
- forward(x: Tuple[torch.Tensor, ...]) → Tuple[List[torch.Tensor]][source]¶
Forward features. Notice CenterNet head does not use FPN.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
- center predict heatmaps for
all levels, the channels number is num_classes.
- wh_preds (list[Tensor]): wh predicts for all levels, the channels
number is 2.
- offset_preds (list[Tensor]): offset predicts for all levels, the
channels number is 2.
- Return type
center_heatmap_preds (list[Tensor])
- forward_single(x: torch.Tensor) → Tuple[torch.Tensor, ...][source]¶
Forward feature of a single level.
- Parameters
x (Tensor) – Feature of a single level.
- Returns
- center predict heatmaps, the
channels number is num_classes.
wh_pred (Tensor): wh predicts, the channels number is 2. offset_pred (Tensor): offset predicts, the channels number is 2.
- Return type
center_heatmap_pred (Tensor)
- get_targets(gt_bboxes: List[torch.Tensor], gt_labels: List[torch.Tensor], feat_shape: tuple, img_shape: tuple) → Tuple[dict, int][source]¶
Compute regression and classification targets in multiple images.
- Parameters
gt_bboxes (list[Tensor]) – Ground truth bboxes for each image with shape (num_gts, 4) in [tl_x, tl_y, br_x, br_y] format.
gt_labels (list[Tensor]) – class indices corresponding to each box.
feat_shape (tuple) – feature map shape with value [B, _, H, W]
img_shape (tuple) – image shape.
- Returns
The float value is mean avg_factor, the dict has components below:
center_heatmap_target (Tensor): targets of center heatmap, shape (B, num_classes, H, W).
wh_target (Tensor): targets of wh predict, shape (B, 2, H, W).
offset_target (Tensor): targets of offset predict, shape (B, 2, H, W).
wh_offset_target_weight (Tensor): weights of wh and offset predict, shape (B, 2, H, W).
- Return type
tuple[dict, float]
- loss_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Compute losses of the head.
- Parameters
center_heatmap_preds (list[Tensor]) – center predict heatmaps for all levels with shape (B, num_classes, H, W).
wh_preds (list[Tensor]) – wh predicts for all levels with shape (B, 2, H, W).
offset_preds (list[Tensor]) – offset predicts for all levels with shape (B, 2, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
- which has components below:
loss_center_heatmap (Tensor): loss of center heatmap.
loss_wh (Tensor): loss of hw heatmap
loss_offset (Tensor): loss of offset heatmap.
- Return type
dict[str, Tensor]
- predict_by_feat(center_heatmap_preds: List[torch.Tensor], wh_preds: List[torch.Tensor], offset_preds: List[torch.Tensor], batch_img_metas: Optional[List[dict]] = None, rescale: bool = True, with_nms: bool = False) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform network output for a batch into bbox predictions.
- Parameters
center_heatmap_preds (list[Tensor]) – Center predict heatmaps for all levels with shape (B, num_classes, H, W).
wh_preds (list[Tensor]) – WH predicts for all levels with shape (B, 2, H, W).
offset_preds (list[Tensor]) – Offset predicts for all levels with shape (B, 2, H, W).
batch_img_metas (list[dict], optional) – Batch image meta info. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to True.
with_nms (bool) – If True, do nms before return boxes. Defaults to False.
- Returns
Instance segmentation results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type
list[
InstanceData
]
- class mmdet.models.dense_heads.CenterNetUpdateHead(num_classes: int, in_channels: int, regress_ranges: Sequence[Tuple[int, int]] = ((0, 80), (64, 160), (128, 320), (256, 640), (512, 1000000000)), hm_min_radius: int = 4, hm_min_overlap: float = 0.8, more_pos_thresh: float = 0.2, more_pos_topk: int = 9, soft_weight_on_reg: bool = False, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'neg_weight': 0.75, 'pos_weight': 0.25, 'type': 'GaussianFocalLoss'}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'GIoULoss'}, norm_cfg: Optional