Weight initialization¶
During training, a proper initialization strategy is beneficial to speeding up the training or obtaining a higher performance. MMCV provide some commonly used methods for initializing modules like nn.Conv2d
. Model initialization in MMdetection mainly uses init_cfg
. Users can initialize models with following two steps:
Define
init_cfg
for a model or its components inmodel_cfg
, butinit_cfg
of children components have higher priority and will overrideinit_cfg
of parents modules.Build model as usual, but call
model.init_weights()
method explicitly, and model parameters will be initialized as configuration.
The high-level workflow of initialization in MMdetection is :
model_cfg(init_cfg) -> build_from_cfg -> model -> init_weight() -> initialize(self, self.init_cfg) -> children’s init_weight()
Description¶
It is dict or list[dict], and contains the following keys and values:
type
(str), containing the initializer name inINTIALIZERS
, and followed by arguments of the initializer.layer
(str or list[str]), containing the names of basic layers in Pytorch or MMCV with learnable parameters that will be initialized, e.g.'Conv2d'
,'DeformConv2d'
.override
(dict or list[dict]), containing the sub-modules that not inherit from BaseModule and whose initialization configuration is different from other layers’ which are in'layer'
key. Initializer defined intype
will work for all layers defined inlayer
, so if sub-modules are not derived Classes ofBaseModule
but can be initialized as same ways of layers inlayer
, it does not need to useoverride
.override
contains:type
followed by arguments of initializer;name
to indicate sub-module which will be initialized.
Initialize parameters¶
Inherit a new model from mmcv.runner.BaseModule
or mmdet.models
Here we show an example of FooModel.
import torch.nn as nn
from mmcv.runner import BaseModule
class FooModel(BaseModule)
def __init__(self,
arg1,
arg2,
init_cfg=None):
super(FooModel, self).__init__(init_cfg)
...
Initialize model by using
init_cfg
directly in codeimport torch.nn as nn from mmcv.runner import BaseModule # or directly inherit mmdet models class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=XXX): super(FooModel, self).__init__(init_cfg) ...
Initialize model by using
init_cfg
directly inmmcv.Sequential
ormmcv.ModuleList
codefrom mmcv.runner import BaseModule, ModuleList class FooModel(BaseModule) def __init__(self, arg1, arg2, init_cfg=None): super(FooModel, self).__init__(init_cfg) ... self.conv1 = ModuleList(init_cfg=XXX)
Initialize model by using
init_cfg
in config filemodel = dict( ... model = dict( type='FooModel', arg1=XXX, arg2=XXX, init_cfg=XXX), ...
Usage of init_cfg¶
Initialize model by
layer
keyIf we only define
layer
, it just initialize the layer inlayer
key.NOTE: Value of
layer
key is the class name with attributes weights and bias of Pytorch, (so such asMultiheadAttention layer
is not supported).
Define
layer
key for initializing module with same configuration.init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d', 'Linear'], val=1) # initialize whole module with same configuration
Define
layer
key for initializing layer with different configurations.
init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
dict(type='Constant', layer='Conv2d', val=2),
dict(type='Constant', layer='Linear', val=3)]
# nn.Conv1d will be initialized with dict(type='Constant', val=1)
# nn.Conv2d will be initialized with dict(type='Constant', val=2)
# nn.Linear will be initialized with dict(type='Constant', val=3)
Initialize model by
override
key
When initializing some specific part with its attribute name, we can use
override
key, and the value inoverride
will ignore the value in init_cfg.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2, override=dict(type='Constant', name='reg', val=3, bias=4)) # self.feat and self.cls will be initialized with dict(type='Constant', val=1, bias=2) # The module called 'reg' will be initialized with dict(type='Constant', val=3, bias=4)
If
layer
is None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.# layers: # self.feat = nn.Conv1d(3, 1, 3) # self.reg = nn.Conv2d(3, 3, 3) # self.cls = nn.Linear(1,2) init_cfg = dict(type='Constant', val=1, bias=2, override=dict(name='reg')) # self.feat and self.cls will be initialized by Pytorch # The module called 'reg' will be initialized with dict(type='Constant', val=1, bias=2)
If we don’t define
layer
key oroverride
key, it will not initialize anything.Invalid usage
# It is invalid that override don't have name key init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2, override=dict(type='Constant', val=3, bias=4)) # It is also invalid that override has name and other args except type init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2, override=dict(name='reg', val=3, bias=4))
Initialize model with the pretrained model
init_cfg = dict(type='Pretrained', checkpoint='torchvision://resnet50')
More details can refer to the documentation in MMEngine