Autogeneration transformation parameters
Short summary: To automatically generate a new model with search variant
containers, you should use
enot.autogeneration.TransformationParameters
class. There are two
available autogeneration properties to the date:
Hidden width multiplier,
width_mult
Convolutional layer kernel size,
kernel_size
Bottlenecks hidden depth multiplier,
depth_mult
To use both options with our autogeneration mechanism, simply do the following:
import torch.nn as nn
from enot.autogeneration import generate_pruned_search_variants_model
from enot.autogeneration import TransformationParameters
...
your_model: nn.Module = ...
generated_model = generate_pruned_search_variants_model(
your_model,
search_variant_descriptors=(
TransformationParameters(width_mult=1.0, kernel_size=3),
TransformationParameters(width_mult=1.0, kernel_size=5),
TransformationParameters(width_mult=0.5, kernel_size=3),
TransformationParameters(width_mult=0.5, kernel_size=5),
TransformationParameters(width_mult=0.0),
),
)
# move model with search variants to the search space
search_space = SearchSpaceModel(generated_model).cuda()
# pretrain, search, extract sub-model, ...
...
To simplify search space creation, we provide a tool to automatically generate new model with search variants from the user-defined model, see more in Search space auto generation.
To define which search variants will be inserted into user-defined model, you
should create a tuple with
enot.autogeneration.TransformationParameters
instances and pass it to
the autogeneration function. TransformationParameters
instance
specifies single transformation configuration for matched patterns in the user
model. Each transformation will be applied once to each matched pattern, and
transformed sub-model will be inserted into SearchVariantsContainer
.
This procedure ensures that each SearchVariantsContainer
instance in
the model will have the same number of choice options.
TransformationParameters
class supports two auto-generated properties
for the time being:
Hidden width multiplier,
width_mult
Convolutional layer kernel size,
kernel_size
Bottlenecks hidden depth multiplier,
depth_mult
- class TransformationParameters(width_mult=1.0, kernel_size=None, depth_mult=1.0)
Container with a single configuration for graph transformation.
For extended description of class usage, see Autogeneration transformation parameters.
- __init__(width_mult=1.0, kernel_size=None, depth_mult=1.0)
- Parameters:
width_mult (float or numpy.float32, optional) – Regulates the number of channels in single search variant. \(C_{new} = C_{source} * {width\_mult}\), where \(C_{new}\) is the number of channels in the generated search variant, and \(C_{source}\) is the number of channels in source sub-model. Default value is 1.0, which keeps channels in the matched pattern unchanged.
kernel_size (positive int or None, optional) – Defines the size of a convolution kernel in single search variant. Convolution kernels in search variants will be initialized with weights from the original convolution kernel and padded with zeros if necessary. If kernel size in the search variant is smaller than kernel size in the original convolution - then the sub-kernel with the maximal l2 norm will be used to initialize convolution weights. For Yolov5-like models choosing kernel_size is not implemented. Default value is None, which keeps kernel size in the matched pattern unchanged.
depth_mult (float or numpy.float32, optional) – Regulates the number of blocks in modules where we can choose
depth
- number of repeated blocks followed each other. Used ony for yolo bottlenecks not for resnet bottlenecks. Default value is 1.0, which keeps number of blocks in the matched bottleneck patterns unchanged.
Let’s see the most basic example:
search_variant_descriptors = (
TransformationParameters(width_mult=1.00),
TransformationParameters(width_mult=0.75),
TransformationParameters(width_mult=0.50),
TransformationParameters(width_mult=0.25),
)
For example, if your model contains
ResNet blocks - they will be
replaced with SearchVariantsContainer
instances, each with 4
operations which were created by reducing hidden (inner) channel number in the
replaced block. Channel counts will be reduced with factors 1.0, 0.75, 0.5,
0.25 (1.0 factor means no channel count reduction). Weights in search variants
are initialized from the weights from the original block.
The same applies for EfficientNet blocks, MobileNetV2 blocks, and others.
Let’s look at another interesting case:
search_variant_descriptors = (
TransformationParameters(width_mult=1.0),
TransformationParameters(width_mult=0.5),
TransformationParameters(width_mult=0.0),
)
The main difference here from the previous example is that we have zero width multiplier, which replaces original layer with search variant with single Conv1x1 of appropriate stride and channel sizes, followed by BatchNorm2d layer.
Our default autogeneration parameters are set as following:
DEFAULT_SPEED_UP_SEARCH_SPACE = (
TransformationParameters(width_mult=1.00),
TransformationParameters(width_mult=0.75),
TransformationParameters(width_mult=0.50),
TransformationParameters(width_mult=0.25),
TransformationParameters(width_mult=0.00),
)
We can also search for kernel size in convolutional layer:
search_variant_descriptors = (
TransformationParameters(kernel_size=3),
TransformationParameters(kernel_size=5),
TransformationParameters(kernel_size=7),
)
In known patterns, we replace convolutions with kernels larger than 1 with the
convolution, which kernel is equal to the kernel_size
argument.
We can also combine these transformations together to search patterns with different inner channel counts and kernel sizes:
search_variant_descriptors = (
TransformationParameters(width_mult=1.0, kernel_size=3),
TransformationParameters(width_mult=1.0, kernel_size=5),
TransformationParameters(width_mult=1.0, kernel_size=7),
TransformationParameters(width_mult=0.5, kernel_size=3),
TransformationParameters(width_mult=0.5, kernel_size=5),
TransformationParameters(width_mult=0.5, kernel_size=7),
TransformationParameters(width_mult=0.0),
)
This is the default configuration to search MobileNetV2-like nets, as it allows to search different kernel sizes (depthwise convolutions with large kernels are quite fast on CPUs), different expansion ratios (inner MobileNetV2 block channel multiplier), and even “skip” unnecessary operations (by replacing them with cheap 1x1 convolution).
As we use Conv1x1 when width_mult
is equal to 0, kernel_size
argument
becomes redundant, and should be set to None.
Another one interesting example for YoloV5-like models.
For this type of models you can not choose kernel size, but you can choose number
of bottlenecks in Yolo C3 block(‘depth’) with depth_mult
argument, by default depth_mult
equals 1.0
keeps number of blocks in the created model unchanged (like in original model).
search_variant_descriptors = (
TransformationParameters(width_mult=1.0, depth_mult=1.0),
TransformationParameters(width_mult=1.0, depth_mult=0.5),
TransformationParameters(width_mult=1.0, depth_mult=0.33),
TransformationParameters(width_mult=0.5, depth_mult=1.0),
TransformationParameters(width_mult=0.5, depth_mult=0.5),
TransformationParameters(width_mult=0.5, depth_mult=0.66),
)