Autogeneration transformation parameters

Short summary: To automatically generate a new model with search variant containers, you should use enot.autogeneration.TransformationParameters class. There are two available autogeneration properties to the date:

Hidden width multiplier, width_mult
Convolutional layer kernel size, kernel_size
Bottlenecks hidden depth multiplier, depth_mult

To use both options with our autogeneration mechanism, simply do the following:

import torch.nn as nn
from enot.autogeneration import generate_pruned_search_variants_model
from enot.autogeneration import TransformationParameters

...

your_model: nn.Module = ...

generated_model = generate_pruned_search_variants_model(
    your_model,
    search_variant_descriptors=(
        TransformationParameters(width_mult=1.0, kernel_size=3),
        TransformationParameters(width_mult=1.0, kernel_size=5),
        TransformationParameters(width_mult=0.5, kernel_size=3),
        TransformationParameters(width_mult=0.5, kernel_size=5),
        TransformationParameters(width_mult=0.0),
    ),
)

# move model with search variants to the search space
search_space = SearchSpaceModel(generated_model).cuda()

# pretrain, search, extract sub-model, ...
...

To simplify search space creation, we provide a tool to automatically generate new model with search variants from the user-defined model, see more in Search space auto generation.

To define which search variants will be inserted into user-defined model, you should create a tuple with enot.autogeneration.TransformationParameters instances and pass it to the autogeneration function. TransformationParameters instance specifies single transformation configuration for matched patterns in the user model. Each transformation will be applied once to each matched pattern, and transformed sub-model will be inserted into SearchVariantsContainer.

This procedure ensures that each SearchVariantsContainer instance in the model will have the same number of choice options.

TransformationParameters class supports two auto-generated properties for the time being:

Hidden width multiplier, width_mult
Convolutional layer kernel size, kernel_size
Bottlenecks hidden depth multiplier, depth_mult

class TransformationParameters(width_mult=1.0, kernel_size=None, depth_mult=1.0)

Container with a single configuration for graph transformation.

For extended description of class usage, see Autogeneration transformation parameters.

__init__(width_mult=1.0, kernel_size=None, depth_mult=1.0)

Parameters:

width_mult (float or numpy.float32, optional) – Regulates the number of channels in single search variant. \(C_{new} = C_{source} * {width_mult}\), where \(C_{new}\) is the number of channels in the generated search variant, and \(C_{source}\) is the number of channels in source sub-model. Default value is 1.0, which keeps channels in the matched pattern unchanged.
kernel_size (positive int or None, optional) – Defines the size of a convolution kernel in single search variant. Convolution kernels in search variants will be initialized with weights from the original convolution kernel and padded with zeros if necessary. If kernel size in the search variant is smaller than kernel size in the original convolution - then the sub-kernel with the maximal l2 norm will be used to initialize convolution weights. For Yolov5-like models choosing kernel_size is not implemented. Default value is None, which keeps kernel size in the matched pattern unchanged.
depth_mult (float or numpy.float32, optional) – Regulates the number of blocks in modules where we can choose depth - number of repeated blocks followed each other. Used ony for yolo bottlenecks not for resnet bottlenecks. Default value is 1.0, which keeps number of blocks in the matched bottleneck patterns unchanged.

property depth_mult: float

Returns:: Search variant number of blocks squeeze factor.
Return type:: float

property kernel_size: int | None

Returns:

Kernel size to use in newly generated search variant, or None, if kernel size is inherited: from the source sub-model.

Return type:

int or None

property width_mult: float

Returns:: Search variant inner channel squeeze factor.
Return type:: float

Let’s see the most basic example:

search_variant_descriptors = (
    TransformationParameters(width_mult=1.00),
    TransformationParameters(width_mult=0.75),
    TransformationParameters(width_mult=0.50),
    TransformationParameters(width_mult=0.25),
)

For example, if your model contains ResNet blocks - they will be replaced with SearchVariantsContainer instances, each with 4 operations which were created by reducing hidden (inner) channel number in the replaced block. Channel counts will be reduced with factors 1.0, 0.75, 0.5, 0.25 (1.0 factor means no channel count reduction). Weights in search variants are initialized from the weights from the original block.

The same applies for EfficientNet blocks, MobileNetV2 blocks, and others.

Let’s look at another interesting case:

search_variant_descriptors = (
    TransformationParameters(width_mult=1.0),
    TransformationParameters(width_mult=0.5),
    TransformationParameters(width_mult=0.0),
)

The main difference here from the previous example is that we have zero width multiplier, which replaces original layer with search variant with single Conv1x1 of appropriate stride and channel sizes, followed by BatchNorm2d layer.

Our default autogeneration parameters are set as following:

DEFAULT_SPEED_UP_SEARCH_SPACE = (
    TransformationParameters(width_mult=1.00),
    TransformationParameters(width_mult=0.75),
    TransformationParameters(width_mult=0.50),
    TransformationParameters(width_mult=0.25),
    TransformationParameters(width_mult=0.00),
)

We can also search for kernel size in convolutional layer:

search_variant_descriptors = (
    TransformationParameters(kernel_size=3),
    TransformationParameters(kernel_size=5),
    TransformationParameters(kernel_size=7),
)

In known patterns, we replace convolutions with kernels larger than 1 with the convolution, which kernel is equal to the kernel_size argument.

We can also combine these transformations together to search patterns with different inner channel counts and kernel sizes:

search_variant_descriptors = (
    TransformationParameters(width_mult=1.0, kernel_size=3),
    TransformationParameters(width_mult=1.0, kernel_size=5),
    TransformationParameters(width_mult=1.0, kernel_size=7),
    TransformationParameters(width_mult=0.5, kernel_size=3),
    TransformationParameters(width_mult=0.5, kernel_size=5),
    TransformationParameters(width_mult=0.5, kernel_size=7),
    TransformationParameters(width_mult=0.0),
)

This is the default configuration to search MobileNetV2-like nets, as it allows to search different kernel sizes (depthwise convolutions with large kernels are quite fast on CPUs), different expansion ratios (inner MobileNetV2 block channel multiplier), and even “skip” unnecessary operations (by replacing them with cheap 1x1 convolution).

As we use Conv1x1 when width_mult is equal to 0, kernel_size argument becomes redundant, and should be set to None.

Another one interesting example for YoloV5-like models. For this type of models you can not choose kernel size, but you can choose number of bottlenecks in Yolo C3 block(‘depth’) with depth_mult argument, by default depth_mult equals 1.0 keeps number of blocks in the created model unchanged (like in original model).

search_variant_descriptors = (
    TransformationParameters(width_mult=1.0, depth_mult=1.0),
    TransformationParameters(width_mult=1.0, depth_mult=0.5),
    TransformationParameters(width_mult=1.0, depth_mult=0.33),
    TransformationParameters(width_mult=0.5, depth_mult=1.0),
    TransformationParameters(width_mult=0.5, depth_mult=0.5),
    TransformationParameters(width_mult=0.5, depth_mult=0.66),
)