MMDetection 系列之（自定义运行设置）

Pytorch支持的自定义优化器

我们已经支持使用所有由PyTorch实现的优化器，唯一的修改就是更改配置文件的优化器字段。例如，如果您想要使用ADAM(注意性能可能会下降很多)，修改可以如下所示。

optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001)

要修改模型的学习率，用户只需修改optimizer配置中的1r即可。用户可以直接在PyTorch的API文档添加链接描述后面设置参数。

定制self-implemented优化器 1、Define a new optimizer

一个定制的优化器可以定义如下。假设您想添加一个名为MyOptimizer的优化器，它有参数a、b和c。您需要创建一个名为mmdet/core/optimizer的新目录。然后在文件中实现新的优化器，例如在mmdet/core/optimizer/my_optimizer.py中:

from .registry import OPTIMIZERS
from torch.optim import Optimizer


@OPTIMIZERS.register_module()
class MyOptimizer(Optimizer):

    def __init__(self, a, b, c)

2. 将优化器添加到注册表

要找到上面定义的模块，首先应该将该模块导入主命名空间。实现这一目标有两种选择。
修改mmdet/core/optimizer/init.py来导入它。新定义的模块应该导入到mmdet/core/optimizer/init.py中，这样注册表就会找到新模块并添加它:

from .my_optimizer import MyOptimizer

Use custom_imports in the config to manually import it

custom_imports = dict(imports=['mmdet.core.optimizer.my_optimizer'], allow_failed_imports=False)

The module mmdet.core.optimizer.my_optimizer will be imported at the beginning of the program and the class MyOptimizer is then automatically registered. Note that only the package containing the class MyOptimizer should be imported. mmdet.core.optimizer.my_optimizer.MyOptimizer cannot be imported directly.
实际上，用户可以使用这种导入方法使用完全不同的文件目录结构，只要模块根可以位于PYTHONPATH。

3.在配置文件中指定优化器

然后你可以在配置文件的优化器字段中使用MyOptimizer。在配置文件中，优化器由字段优化器定义，如下所示:

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)

To use your own optimizer, the field can be changed to

optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value)

Customize optimizer constructor

一些模型可能有一些参数特定的设置来优化，例如BatchNorm层的权重衰减。用户可以通过定制优化器构造函数来实现这些细粒度的参数调整。

from mmcv.utils import build_from_cfg

from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS
from mmdet.utils import get_root_logger
from .my_optimizer import MyOptimizer


@OPTIMIZER_BUILDERS.register_module()
class MyOptimizerConstructor(object):

    def __init__(self, optimizer_cfg, paramwise_cfg=None):

    def __call__(self, model):

        return my_optimizer

这里实现了默认的优化器构造函数，它也可以作为新的优化器构造函数的模板。

import warnings

import torch
from torch.nn import GroupNorm, LayerNorm

from mmcv.utils import _BatchNorm, _InstanceNorm, build_from_cfg, is_list_of
from .builder import OPTIMIZER_BUILDERS, OPTIMIZERS


@OPTIMIZER_BUILDERS.register_module()
class DefaultOptimizerConstructor:
    """Default constructor for optimizers.
    By default each parameter share the same optimizer settings, and we
    provide an argument ``paramwise_cfg`` to specify parameter-wise settings.
    It is a dict and may contain the following fields:
    - ``custom_keys`` (dict): Specified parameters-wise settings by keys. If
      one of the keys in ``custom_keys`` is a substring of the name of one
      parameter, then the setting of the parameter will be specified by
      ``custom_keys[key]`` and other setting like ``bias_lr_mult`` etc. will
      be ignored. It should be noted that the aforementioned ``key`` is the
      longest key that is a substring of the name of the parameter. If there
      are multiple matched keys with the same length, then the key with lower
      alphabet order will be chosen.
      ``custom_keys[key]`` should be a dict and may contain fields ``lr_mult``
      and ``decay_mult``. See Example 2 below.
    - ``bias_lr_mult`` (float): It will be multiplied to the learning
      rate for all bias parameters (except for those in normalization
      layers).
    - ``bias_decay_mult`` (float): It will be multiplied to the weight
      decay for all bias parameters (except for those in
      normalization layers and depthwise conv layers).
    - ``norm_decay_mult`` (float): It will be multiplied to the weight
      decay for all weight and bias parameters of normalization
      layers.
    - ``dwconv_decay_mult`` (float): It will be multiplied to the weight
      decay for all weight and bias parameters of depthwise conv
      layers.
    - ``bypass_duplicate`` (bool): If true, the duplicate parameters
      would not be added into optimizer. Default: False.
    Args:
        model (:obj:`nn.Module`): The model with parameters to be optimized.
        optimizer_cfg (dict): The config dict of the optimizer.
            Positional fields are
                - `type`: class name of the optimizer.
            Optional fields are
                - any arguments of the corresponding optimizer type, e.g.,
                  lr, weight_decay, momentum, etc.
        paramwise_cfg (dict, optional): Parameter-wise options.
    Example 1:
        >>> model = torch.nn.modules.Conv1d(1, 1, 1)
        >>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
        >>>                      weight_decay=0.0001)
        >>> paramwise_cfg = dict(norm_decay_mult=0.)
        >>> optim_builder = DefaultOptimizerConstructor(
        >>>     optimizer_cfg, paramwise_cfg)
        >>> optimizer = optim_builder(model)
    Example 2:
        >>> # assume model have attribute model.backbone and model.cls_head
        >>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
        >>> paramwise_cfg = dict(custom_keys={
                '.backbone': dict(lr_mult=0.1, decay_mult=0.9)})
        >>> optim_builder = DefaultOptimizerConstructor(
        >>>     optimizer_cfg, paramwise_cfg)
        >>> optimizer = optim_builder(model)
        >>> # Then the `lr` and `weight_decay` for model.backbone is
        >>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
        >>> # model.cls_head is (0.01, 0.95).
    """

    def __init__(self, optimizer_cfg, paramwise_cfg=None):
        if not isinstance(optimizer_cfg, dict):
            raise TypeError('optimizer_cfg should be a dict',
                            f'but got {type(optimizer_cfg)}')
        self.optimizer_cfg = optimizer_cfg
        self.paramwise_cfg = {} if paramwise_cfg is None else paramwise_cfg
        self.base_lr = optimizer_cfg.get('lr', None)
        self.base_wd = optimizer_cfg.get('weight_decay', None)
        self._validate_cfg()

    def _validate_cfg(self):
        if not isinstance(self.paramwise_cfg, dict):
            raise TypeError('paramwise_cfg should be None or a dict, '
                            f'but got {type(self.paramwise_cfg)}')

        if 'custom_keys' in self.paramwise_cfg:
            if not isinstance(self.paramwise_cfg['custom_keys'], dict):
                raise TypeError(
                    'If specified, custom_keys must be a dict, '
                    f'but got {type(self.paramwise_cfg["custom_keys"])}')
            if self.base_wd is None:
                for key in self.paramwise_cfg['custom_keys']:
                    if 'decay_mult' in self.paramwise_cfg['custom_keys'][key]:
                        raise ValueError('base_wd should not be None')

        # get base lr and weight decay
        # weight_decay must be explicitly specified if mult is specified
        if ('bias_decay_mult' in self.paramwise_cfg
                or 'norm_decay_mult' in self.paramwise_cfg
                or 'dwconv_decay_mult' in self.paramwise_cfg):
            if self.base_wd is None:
                raise ValueError('base_wd should not be None')

    def _is_in(self, param_group, param_group_list):
        assert is_list_of(param_group_list, dict)
        param = set(param_group['params'])
        param_set = set()
        for group in param_group_list:
            param_set.update(set(group['params']))

        return not param.isdisjoint(param_set)

    def add_params(self, params, module, prefix=''):
        """Add all parameters of module to the params list.
        The parameters of the given module will be added to the list of param
        groups, with specific rules defined by paramwise_cfg.
        Args:
            params (list[dict]): A list of param groups, it will be modified
                in place.
            module (nn.Module): The module to be added.
            prefix (str): The prefix of the module
        """
        # get param-wise options
        custom_keys = self.paramwise_cfg.get('custom_keys', {})
        # first sort with alphabet order and then sort with reversed len of str
        sorted_keys = sorted(sorted(custom_keys.keys()), key=len, reverse=True)

        bias_lr_mult = self.paramwise_cfg.get('bias_lr_mult', 1.)
        bias_decay_mult = self.paramwise_cfg.get('bias_decay_mult', 1.)
        norm_decay_mult = self.paramwise_cfg.get('norm_decay_mult', 1.)
        dwconv_decay_mult = self.paramwise_cfg.get('dwconv_decay_mult', 1.)
        bypass_duplicate = self.paramwise_cfg.get('bypass_duplicate', False)

        # special rules for norm layers and depth-wise conv layers
        is_norm = isinstance(module,
                             (_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm))
        is_dwconv = (
            isinstance(module, torch.nn.Conv2d)
            and module.in_channels == module.groups)

        for name, param in module.named_parameters(recurse=False):
            param_group = {'params': [param]}
            if not param.requires_grad:
                params.append(param_group)
                continue
            if bypass_duplicate and self._is_in(param_group, params):
                warnings.warn(f'{prefix} is duplicate. It is skipped since '
                              f'bypass_duplicate={bypass_duplicate}')
                continue
            # if the parameter match one of the custom keys, ignore other rules
            is_custom = False
            for key in sorted_keys:
                if key in f'{prefix}.{name}':
                    is_custom = True
                    lr_mult = custom_keys[key].get('lr_mult', 1.)
                    param_group['lr'] = self.base_lr * lr_mult
                    if self.base_wd is not None:
                        decay_mult = custom_keys[key].get('decay_mult', 1.)
                        param_group['weight_decay'] = self.base_wd * decay_mult
                    break
            if not is_custom:
                # bias_lr_mult affects all bias parameters except for norm.bias
                if name == 'bias' and not is_norm:
                    param_group['lr'] = self.base_lr * bias_lr_mult
                # apply weight decay policies
                if self.base_wd is not None:
                    # norm decay
                    if is_norm:
                        param_group[
                            'weight_decay'] = self.base_wd * norm_decay_mult
                    # depth-wise conv
                    elif is_dwconv:
                        param_group[
                            'weight_decay'] = self.base_wd * dwconv_decay_mult
                    # bias lr and decay
                    elif name == 'bias':
                        param_group[
                            'weight_decay'] = self.base_wd * bias_decay_mult
            params.append(param_group)

        for child_name, child_mod in module.named_children():
            child_prefix = f'{prefix}.{child_name}' if prefix else child_name
            self.add_params(params, child_mod, prefix=child_prefix)

    def __call__(self, model):
        if hasattr(model, 'module'):
            model = model.module

        optimizer_cfg = self.optimizer_cfg.copy()
        # if no paramwise option is specified, just use the global setting
        if not self.paramwise_cfg:
            optimizer_cfg['params'] = model.parameters()
            return build_from_cfg(optimizer_cfg, OPTIMIZERS)

        # set param-wise lr and weight decay recursively
        params = []
        self.add_params(params, model)
        optimizer_cfg['params'] = params

        return build_from_cfg(optimizer_cfg, OPTIMIZERS)

Additional settings 使用梯度剪辑来稳定训练:有些模型需要梯度剪辑来剪辑梯度来稳定训练过程。

optimizer_config = dict(
    _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))

如果您的配置继承了已经设置了optimizer_config的基本配置，您可能需要_delete_=True来覆盖不必要的设置。有关更多细节，请参阅配置文档。

使用动量计划来加速模型收敛:

我们支持动量调度，根据学习速率修改模型的动量，使模型更快地收敛。动量调度器通常与LR调度器一起使用，例如，在三维检测中使用以下配置来加速收敛。更多细节，请参考CyclicLrUpdater和CyclicMomentumUpdater的实现。

lr_config = dict(
    policy='cyclic',
    target_ratio=(10, 1e-4),
    cyclic_times=1,
    step_ratio_up=0.4,
)
momentum_config = dict(
    policy='cyclic',
    target_ratio=(0.85 / 0.95, 1),
    cyclic_times=1,
    step_ratio_up=0.4,
)

自定义训练学习率

默认情况下，我们使用1xschedule的步进学习速率，这在MMCV中调用了StepLRHook。我们支持许多其他的学习速率计划，如余弦退火和Poly计划。下面是一些例子
Poly schedule:

lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)

ConsineAnnealing schedule:

lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=1000,
    warmup_ratio=1.0 / 10,
    min_lr_ratio=1e-5)

自定义工作流

工作流是指定运行顺序和时间的(阶段，时间)列表。默认情况下，它被设置为

workflow = [('train', 1)]

这意味着要进行1个阶段的训练。有时，用户可能想要检查验证集中模型的一些指标(例如损失、准确性)。在这种情况下，我们可以将工作流设置为

[('train', 1), ('val', 1)]

因此，1个训练阶段和1个验证阶段将反复运行。
1、The parameters of model will not be updated during val epoch.
2、配置中的关键字total_epoch只控制培训epoch的数量，不会影响验证工作流。
3.工作流[(‘train’， 1)， (‘val’， 1)]和[(‘train’， 1)]不会改变EvalHook的行为，因为EvalHook是由after_train_epoch调用的，并且验证工作流只影响通过after_val_epoch调用的钩子。因此，[(‘train’， 1)， (‘val’， 1)]和[((‘train’， 1)]之间的唯一区别是，跑步者将在每个训练周期后计算验证集上的损失。

Customize hooks 1、Implement a new hook

在某些情况下，用户可能需要实现一个新的钩子。MMDetection从2.3.0版本开始支持在training(#3395)中定制钩子。因此，用户可以直接在mmdet或基于mmdet的代码库中实现钩子，并且只需要在培训中修改配置即可使用钩子。在v2.3.0之前，用户需要在培训开始之前修改代码来注册钩子。这里我们给出一个在mmdet中创建一个新钩子并在培训中使用它的例子。

from mmcv.runner import HOOKS, Hook


@HOOKS.register_module()
class MyHook(Hook):

    def __init__(self, a, b):
        pass

    def before_run(self, runner):
        pass

    def after_run(self, runner):
        pass

    def before_epoch(self, runner):
        pass

    def after_epoch(self, runner):
        pass

    def before_iter(self, runner):
        pass

    def after_iter(self, runner):
        pass

根据钩子的功能，用户需要在before_run、after_run、before_epoch、after_epoch、before_iter和after_iter中指定钩子在训练的每个阶段将做什么。

2. Register the new hook

Then we need to make MyHook imported. Assuming the file is in mmdet/core/utils/my_hook.py there are two ways to do that:

Modify mmdet/core/utils/init.py to import it.

The newly defined module should be imported in mmdet/core/utils/init.py so that the registry will find the new module and add it:

from .my_hook import MyHook

Use custom_imports in the config to manually import it

custom_imports = dict(imports=['mmdet.core.utils.my_hook'], allow_failed_imports=False)

3. Modify the config

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value)
]

你也可以设置钩子的优先级，将键的优先级设置为“NORMAL”或“HIGHEST”，如下所示

custom_hooks = [
    dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL')
]

By default the hook’s priority is set as NORMAL during registration.

在MMCV中使用hook

如果该钩子已经在MMCV中实现，你可以直接修改配置来使用该钩子，如下所示
修改默认运行时hook
Example: NumClassCheckHook
我们实现了一个名为NumClassCheckHook的自定义钩子来检查head中的num_classes是否与dataset中的CLASSSES长度匹配。
We set it in default_runtime.py.

custom_hooks = [dict(type='NumClassCheckHook')]

Modify default runtime hooks

log_config

checkpoint_config

evaluation

lr_config

optimizer_config

momentum_config
在这些钩子中，只有记录器钩子具有VERY_LOw优先级，其他钩子的优先级为NORMAL。上述教程已经介绍了如何修改optimizer_config、momentum_config和lr_config。在这里，我们揭示了如何使用log_config、checkpoint_config和evaluate来做什么。

Checkpoint config

The MMCV runner will use checkpoint_config to initialize CheckpointHook.

checkpoint_config = dict(interval=1)

用户可以将max_keep_ckpts设置为只保存少量的检查点，或者通过save_optimizer来决定是否存储优化器的状态字典。更多争论的细节在这里

Log config

log_config封装了多个日志记录器钩子，并允许设置间隔。现在MMCV支持WandbLoggerHook, MlflowLoggerHook和TensorboardLoggerHook。详细用法可以在文档中找到。

log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])

Evaluation config

评估的配置将用于初始化EvalHook。除键间隔外，其他参数(如metric)将被传递给数据集.evaluate ()

evaluation = dict(interval=1, metric='bbox')

MMDetection 系列之（自定义运行设置）

Python相关栏目本月热门文章