Automatic Quantization¶
enot.quantization
package contains functional for automatic quantization
of user models. Best suitable for preparing user models for enot-lite int8
engines.
With enot.quantization
package, you can automatically convert your
PyTorch
model to our intermediate representation which allows you to
perform multiple kinds of quantization including vector quantization for
TensorRT and OpenVINO.
This package features automatic distillation for weight fine-tuning, automatic quantization threshold search described in Fast Adjustable Threshold paper, different methods for layer selection for distillation and a number of fake-quantization algorithms.
enot-lite quantization¶
- class TrtFakeQuantizedModel(model, leaf_modules=None)¶
Bases:
enot.quantization.fake_quantized_model.FakeQuantizedModel
Quantized TensorRT model class, which uses int8 convolutions and fully-connected layers.
This class is used for quantization aware training.
- __init__(model, leaf_modules=None)¶
- Parameters
model (nn.Module) – Model from which
TrtFakeQuantizedModel
will be constructed.leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.
- enable_calibration_mode(mode=True)¶
Enables or disables calibration mode.
In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.
- Parameters
mode (bool, optional) – Whether to enable calibration mode. Default value is True.
- Returns
self
- Return type
- enable_quantization_mode(mode=True)¶
Enables or disables fake quantization.
Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.
- Parameters
mode (bool, optional) – Whether to use fake quantization. Default value is True.
- Returns
self
- Return type
- quantization_parameters()¶
Returns an iterator over model quantization parameters (quantization thresholds).
- Returns
An iterator over model quantization parameters.
- Return type
iterator over torch.nn.Parameter
Notes
Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.
- regular_parameters()¶
Returns an iterator over model parameters excluding quantization parameters.
- Returns
An iterator over regular model parameters.
- Return type
iterator over torch.nn.Parameter
- class OpenvinoFakeQuantizedModel(model, leaf_modules=None)¶
Bases:
enot.quantization.fake_quantized_model.FakeQuantizedModel
Quantized OpenVINO model class, which uses int8 convolutions and fully-connected layers.
This class is used for quantization aware training.
- __init__(model, leaf_modules=None)¶
- Parameters
model (nn.Module) – Model from which
OpenvinoFakeQuantizedModel
will be constructed.leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.
- enable_calibration_mode(mode=True)¶
Enables or disables calibration mode.
In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.
- Parameters
mode (bool, optional) – Whether to enable calibration mode. Default value is True.
- Returns
self
- Return type
- enable_quantization_mode(mode=True)¶
Enables or disables fake quantization.
Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.
- Parameters
mode (bool, optional) – Whether to use fake quantization. Default value is True.
- Returns
self
- Return type
- quantization_parameters()¶
Returns an iterator over model quantization parameters (quantization thresholds).
- Returns
An iterator over model quantization parameters.
- Return type
iterator over torch.nn.Parameter
Notes
Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.
- regular_parameters()¶
Returns an iterator over model parameters excluding quantization parameters.
- Returns
An iterator over regular model parameters.
- Return type
iterator over torch.nn.Parameter
- class FakeQuantizedModel(model, transform_patterns, activations_quantization_type, leaf_modules=None, **kwargs)¶
Bases:
torch.nn.modules.module.Module
Base FakeQuantized model class.
Inserts fake quantization nodes into the model and provides interface for calibration and quantization aware training.
Use this class if you want to implement your own quantization scheme.
- Parameters
- __init__(model, transform_patterns, activations_quantization_type, leaf_modules=None, **kwargs)¶
- Parameters
model (torch.nn.Module) – Model from which
FakeQuantizedModel
will be constructed.transform_patterns (Sequence[Tuple[SubgraphTransformPattern, ...]]) – Sequence of group of transformations patterns. Each group will be applied separately.
activations_quantization_type (QuantizationType) – Type of activations quantization. Activations quantization supports only layerwise (scalar) and QuantizationStrategy and any QuantizationStrategy.
leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.
kwargs (kwargs) – Additional options.
- enable_calibration_mode(mode=True)¶
Enables or disables calibration mode.
In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.
- Parameters
mode (bool, optional) – Whether to enable calibration mode. Default value is True.
- Returns
self
- Return type
- enable_quantization_mode(mode=True)¶
Enables or disables fake quantization.
Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.
- Parameters
mode (bool, optional) – Whether to use fake quantization. Default value is True.
- Returns
self
- Return type
- quantization_parameters()¶
Returns an iterator over model quantization parameters (quantization thresholds).
- Returns
An iterator over model quantization parameters.
- Return type
iterator over torch.nn.Parameter
Notes
Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.
- regular_parameters()¶
Returns an iterator over model parameters excluding quantization parameters.
- Returns
An iterator over regular model parameters.
- Return type
iterator over torch.nn.Parameter
- class QuantizationGranularity(value)¶
Quantization granularity: layerwise or channelwise.
- CHANNELWISE = 'CHANNELWISE'¶
- LAYERWISE = 'LAYERWISE'¶
- class QuantizationStrategy(value)¶
Quantization strategy: symmetric or assymetric.
- ASYMMETRIC = 'ASYMMETRIC'¶
- SYMMETRIC = 'SYMMETRIC'¶
- class CalibrationMethod(value)¶
Calibration method: how-to calculate calibration thresholds.
- MIN_MAX = 'MIN_MAX'¶
- class RoundingFunction(value)¶
Function is used to round values in quantization/dequantization transformations.
- HALF_TO_EVEN = ('HALF_TO_EVEN', <built-in method apply of FunctionMeta object>)¶
- HALF_UP = ('HALF_UP', <built-in method apply of FunctionMeta object>)¶
- class QuantizationType(granularity, strategy, calibration_method=CalibrationMethod.MIN_MAX, rounding_function=RoundingFunction.HALF_UP, bitness=8, calibration_options=None)¶
Defines the type of quantization.
- Parameters
granularity (
QuantizationGranularity
) –strategy (
QuantizationStrategy
) –calibration_method (
CalibrationMethod
) –rounding_function (
RoundingFunction
) –bitness (
int
) –
- __init__(granularity, strategy, calibration_method=CalibrationMethod.MIN_MAX, rounding_function=RoundingFunction.HALF_UP, bitness=8, calibration_options=None)¶
- Parameters
granularity (QuantizationGranularity) – Quantization granularity: layerwise or channelwise.
strategy (QuantizationStrategy) – QuantizationStrategy: symmetric or asymmetric.
calibration_method (CalibrationMethod) – The method that is used to calculate thresholds.
rounding_function (RoundingFunction) – The function that is used to round values in quantization procedure.
bitness (int) – Bitness of quantization.
calibration¶
Functions and classes from calibration
module provides easy functional to
calibrate quantization thresholds in fake-quantized model.
- calibrate_quantized_model(quantized_model, dataloader, n_steps=None, epochs=1, sample_to_model_inputs=<function default_sample_to_model_inputs>, verbose=0)¶
Calibrates all quantization thresholds in a quantized model.
- Parameters
quantized_model (Any) – Model to calibrate quantization thresholds.
dataloader (torch.utils.data.DataLoader) – Dataloader which generates data that will be used to update model’s quantization thresholds.
n_steps (int or None, optional) – Number of total threshold calibration steps. Default value is None, which runs calibration on all dataloader images for the number of epochs in
epochs
argument.epochs (int, optional) – Number of total threshold calibration epochs. Not used when
n_steps
argument is not None. Default value is 1.sample_to_model_inputs (Callable, optional) – Function to map dataloader samples to model input format. Default value is
default_sample_to_model_inputs()
. See more in Converting dataloader items to PyTorch model inputs.verbose (int, optional) – Procedure verbosity level. 0 disables all messages, 1 enables
tqdm
progress bar logging, 2 gives additional information about calibration. Default value is 0.
Notes
Before calling this function, your model should be prepared to be as close to practical inference usage as possible. For example, it is your responsibility to call
eval
method of your model if your inference requires calling this method (e.g. when the model contains dropout layers).Typically, it is better to calibrate quantization thresholds on validation-like data without augmentations (but with inference input preprocessing).
- Return type
- class calibration_context(quantized_model)¶
Bases:
contextlib.ContextDecorator
Context manager which enables and disables quantization threshold calibration procedure.
Within this context manager, calibration procedure is enabled in all FakeQuantization models. Exiting this context manager resets calibration and quantization flags in all layers to their initial values.
- Parameters
quantized_model (
Any
) –
- __init__(quantized_model)¶
- Parameters
quantized_model (Any) – Fake-quantized model instance to calibrate.
distillation¶
Functions and classes from distillation
module provides utilities and
procedures for quantized model fine-tuning using knowledge distillation.
Helper functions for distillation:
- class DistillationLayer¶
Bases:
torch.nn.modules.module.Module
Base class for all distillation marker layers.
The main purpose of this class is to mark tensors which one wants to distill. This layer’s
forward
function takes exactly one input tensor and returns it.Examples
>>> import torch >>> from torch import nn >>> from enot.distillation.distillation_layer import DistillationLayer >>> >>> student_model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True) >>> features = student_model.features >>> # Mark output of the first convolutional layer to distill. >>> features[0] = nn.Sequential(*features[0], DistillationLayer()) >>> student_model.features = nn.Sequential(*features)
- class QuantDistillationModule¶
Bases:
enot.distillation.distillation_layer.DistillationLayer
Marker layer for quantization distillation.
To mark a tensor for distillation (i.e. tell enot quantization procedures that they should use this tensor for distillation), you should insert this module into your model, and wrap all tensors which you want to distill with this module’s call.
Notes
You can call this module multiple times for different tensors, and your model can have multiple
QuantDistillationModule
nested class instances.Examples
>>> from torch import nn >>> from enot.quantization.distillation.modules import QuantDistillationModule >>> >>> class MyModelLayer(nn.Module): ... def __init__(self): ... super().__init__() ... self.conv1 = nn.Conv2d(4, 8, (3, 3)) ... self.conv2 = nn.Conv2d(8, 4, (3, 3)) ... self.distillation_module = QuantDistillationModule() ... def forward(self, x): ... conv1_out = self.conv1(x) ... conv1_out = self.distillation_module(conv1_out) # Marks conv1 output as distillation target. ... conv2_out = self.conv2(x) ... return conv2_out ... >>> my_module = MyModelLayer()
- class DistillationLayerSelectionStrategy(value)¶
Bases:
enum.Enum
Layer selection strategy for quantization distillation procedure.
This strategy tells which layers will be used for distillation process during enot quantization.
Layer selection for distillation has been proven to be important, with some strategies being more robust in general while the others can provide better results in specific cases. For example, it is not a good idea to make distillation over detection network outputs as it produces unstable gradients. However, distillation over classification network outputs is widely used and provides good results in multiple scenarios.
The four available regimes are the following:
DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS
Finds last quantized layers and distill over their outputs. Default option, robust to different scenarios including classification, segmentation, detection.
DistillationLayerSelectionStrategy.DISTILL_OUTPUTS
Finds all
PyTorch
tensors in user model’s outputs and distill over them. Useful for classification problems with cross entropy loss (torch.nn.CrossEntropyLoss
).
DistillationLayerSelectionStrategy.DISTILL_ALL_QUANT_LAYERS
Finds all quantized layers and distill over their outputs. This is generally more robust to overfitting, but in practice converges worse for small number of distillation epochs.
DistillationLayerSelectionStrategy.DISTILL_USER_DEFINED_LAYERS
Distillation over user-defined tensors in the model. User should wrap such tensors with
QuantDistillationModule
module call. For more information, see it’s documentation.
- add_distillation_nodes_to_onnx_converted_model(traced_model, onnx_tensor_names)¶
Inserts distillation nodes (instances of
QuantDistillationModule
) after nodes defined by user.- Parameters
traced_model (torch.fx.GraphModule) – Model converted from ONNX to PyTorch using onnx2torch package. This model has to be converted by the
convert()
function withattach_onnx_mapping
set to True.onnx_tensor_names (list with str) – List with onnx tensor names from the original onnx model.
- Return type
- class distillation_context(quantized_model, layer_selection_strategy=DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS)¶
Bases:
contextlib.ContextDecorator
Context manager which enables and disables distillation procedure.
- Parameters
quantized_model (
FakeQuantizedModel
) –layer_selection_strategy (
DistillationLayerSelectionStrategy
) –
We are also open-sourcing our distillation procedures to allow user to
customize distillation procedure. All classes below are public, and you can
view their source code by clicking on [source]
links.
- class QuantizationDistiller(quantized_model, dataloader, optimizer=None, scheduler=None, distillation_layer_selection_strategy=DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS, distillation_criterion='RMSELoss', n_epochs=1, device='cuda:0', sample_to_model_inputs=<function default_sample_to_model_inputs>, logdir=None, save_every=None, verbose=0)[source]¶
Bases:
enot.quantization.distillation.quantization_distiller.DistillerInterface
Quantized model distillation class with a simple distillation implementation.
- Parameters
quantized_model (
FakeQuantizedModel
) –dataloader (
DataLoader
) –optimizer (
Optional
[Optimizer
]) –scheduler (
Optional
[_LRScheduler
]) –distillation_layer_selection_strategy (
DistillationLayerSelectionStrategy
) –n_epochs (
int
) –sample_to_model_inputs (
Callable
[[Any
],Tuple
[Tuple
,Dict
[str
,Any
]]]) –verbose (
int
) –
- class SequentialDistiller(*distillers)[source]¶
Bases:
enot.quantization.distillation.quantization_distiller.DistillerInterface
Compound distillation class which performs sequential distillation with multiple strategies.
- Parameters
distillers (
DistillerInterface
) –
- class DefaultQuantizationDistiller(quantized_model, dataloader, device='cuda:0', sample_to_model_inputs=<function default_sample_to_model_inputs>, logdir=None, save_every=None, n_batches_calibrate=10, verbose=0)[source]¶
Bases:
enot.quantization.distillation.quantization_distiller.SequentialDistiller
Wrapper over SequentialDistiller with a default well-performing distillation configuration.