Automatic Quantization¶

enot.quantization package contains functional for automatic quantization of user models. Best suitable for preparing user models for enot-lite int8 engines.

With enot.quantization package, you can automatically convert your PyTorch model to our intermediate representation which allows you to perform multiple kinds of quantization including vector quantization for TensorRT and OpenVINO.

This package features automatic distillation for weight fine-tuning, automatic quantization threshold search described in Fast Adjustable Threshold paper, different methods for layer selection for distillation and a number of fake-quantization algorithms.

enot-lite quantization¶

class TrtFakeQuantizedModel(model, leaf_modules=None)¶

Bases: enot.quantization.fake_quantized_model.FakeQuantizedModel

Quantized TensorRT model class, which uses int8 convolutions and fully-connected layers.

This class is used for quantization aware training.

Parameters

model (Module) –
leaf_modules (Optional[List[Union[Type[Module], Module]]]) –

__init__(model, leaf_modules=None)¶

Parameters

model (nn.Module) – Model from which TrtFakeQuantizedModel will be constructed.
leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.

enable_calibration_mode(mode=True)¶

Enables or disables calibration mode.

In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.

Parameters: mode (bool, optional) – Whether to enable calibration mode. Default value is True.
Returns: self
Return type: FakeQuantizedModel

enable_quantization_mode(mode=True)¶

Enables or disables fake quantization.

Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.

Parameters: mode (bool, optional) – Whether to use fake quantization. Default value is True.
Returns: self
Return type: FakeQuantizedModel

quantization_parameters()¶

Returns an iterator over model quantization parameters (quantization thresholds).

Returns: An iterator over model quantization parameters.
Return type: iterator over torch.nn.Parameter

Notes

Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.

regular_parameters()¶

Returns an iterator over model parameters excluding quantization parameters.

Returns: An iterator over regular model parameters.
Return type: iterator over torch.nn.Parameter

class OpenvinoFakeQuantizedModel(model, leaf_modules=None)¶

Bases: enot.quantization.fake_quantized_model.FakeQuantizedModel

Quantized OpenVINO model class, which uses int8 convolutions and fully-connected layers.

This class is used for quantization aware training.

Parameters

model (Module) –
leaf_modules (Optional[List[Union[Type[Module], Module]]]) –

__init__(model, leaf_modules=None)¶

Parameters

model (nn.Module) – Model from which OpenvinoFakeQuantizedModel will be constructed.
leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.

enable_calibration_mode(mode=True)¶

Enables or disables calibration mode.

In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.

Parameters: mode (bool, optional) – Whether to enable calibration mode. Default value is True.
Returns: self
Return type: FakeQuantizedModel

enable_quantization_mode(mode=True)¶

Enables or disables fake quantization.

Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.

Parameters: mode (bool, optional) – Whether to use fake quantization. Default value is True.
Returns: self
Return type: FakeQuantizedModel

quantization_parameters()¶

Returns an iterator over model quantization parameters (quantization thresholds).

Returns: An iterator over model quantization parameters.
Return type: iterator over torch.nn.Parameter

Notes

Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.

regular_parameters()¶

Returns an iterator over model parameters excluding quantization parameters.

Returns: An iterator over regular model parameters.
Return type: iterator over torch.nn.Parameter

class FakeQuantizedModel(model, transform_patterns, activations_quantization_type, leaf_modules=None, **kwargs)¶

Bases: torch.nn.modules.module.Module

Base FakeQuantized model class.

Inserts fake quantization nodes into the model and provides interface for calibration and quantization aware training.

Use this class if you want to implement your own quantization scheme.

Parameters

model (Module) –
transform_patterns (Sequence[Tuple[SubgraphTransformPattern, …]]) –
activations_quantization_type (QuantizationType) –
leaf_modules (Optional[List[Union[Type[Module], Module]]]) –

__init__(model, transform_patterns, activations_quantization_type, leaf_modules=None, **kwargs)¶

Parameters

model (torch.nn.Module) – Model from which FakeQuantizedModel will be constructed.
transform_patterns (Sequence[Tuple[SubgraphTransformPattern, ...]]) – Sequence of group of transformations patterns. Each group will be applied separately.
activations_quantization_type (QuantizationType) – Type of activations quantization. Activations quantization supports only layerwise (scalar) and QuantizationStrategy and any QuantizationStrategy.
leaf_modules (list with types of modules or instances of torch.nn.Module, optional) – Types of modules or module instances that must be interpreted as leaf modules while tracing.
kwargs (kwargs) – Additional options.

enable_calibration_mode(mode=True)¶

Enables or disables calibration mode.

In the calibration mode quantized model collects input data statistics which will be used for quantization parameter initialization.

Parameters: mode (bool, optional) – Whether to enable calibration mode. Default value is True.
Returns: self
Return type: FakeQuantizedModel

enable_quantization_mode(mode=True)¶

Enables or disables fake quantization.

Fake quantization mode is enabled for all quantized layers. In this regime these layers are using fake quantization nodes to produce quantized weights and activations during forward pass.

Parameters: mode (bool, optional) – Whether to use fake quantization. Default value is True.
Returns: self
Return type: FakeQuantizedModel

quantization_parameters()¶

Returns an iterator over model quantization parameters (quantization thresholds).

Returns: An iterator over model quantization parameters.
Return type: iterator over torch.nn.Parameter

Notes

Weights of quantized modules (like convolution weight tensor or linear layer weight matrix) are not quantization parameters.

regular_parameters()¶

Returns an iterator over model parameters excluding quantization parameters.

Returns: An iterator over regular model parameters.
Return type: iterator over torch.nn.Parameter

class QuantizationGranularity(value)¶

Quantization granularity: layerwise or channelwise.

CHANNELWISE = 'CHANNELWISE'¶

LAYERWISE = 'LAYERWISE'¶

class QuantizationStrategy(value)¶

Quantization strategy: symmetric or assymetric.

ASYMMETRIC = 'ASYMMETRIC'¶

SYMMETRIC = 'SYMMETRIC'¶

class CalibrationMethod(value)¶

Calibration method: how-to calculate calibration thresholds.

MIN_MAX = 'MIN_MAX'¶

class RoundingFunction(value)¶

Function is used to round values in quantization/dequantization transformations.

HALF_TO_EVEN = ('HALF_TO_EVEN', <built-in method apply of FunctionMeta object>)¶

HALF_UP = ('HALF_UP', <built-in method apply of FunctionMeta object>)¶

class QuantizationType(granularity, strategy, calibration_method=CalibrationMethod.MIN_MAX, rounding_function=RoundingFunction.HALF_UP, bitness=8, calibration_options=None)¶

Defines the type of quantization.

Parameters

granularity (QuantizationGranularity) –
strategy (QuantizationStrategy) –
calibration_method (CalibrationMethod) –
rounding_function (RoundingFunction) –
bitness (int) –
calibration_options (Optional[Dict[str, Any]]) –

__init__(granularity, strategy, calibration_method=CalibrationMethod.MIN_MAX, rounding_function=RoundingFunction.HALF_UP, bitness=8, calibration_options=None)¶

Parameters

granularity (QuantizationGranularity) – Quantization granularity: layerwise or channelwise.
strategy (QuantizationStrategy) – QuantizationStrategy: symmetric or asymmetric.
calibration_method (CalibrationMethod) – The method that is used to calculate thresholds.
rounding_function (RoundingFunction) – The function that is used to round values in quantization procedure.
bitness (int) – Bitness of quantization.
calibration_options (Optional[Dict[str, Any]]) –

float_model_from_quantized_model(quantized_model)¶

Creates a copy of quantized model with the disabled fake quantization.

Parameters: quantized_model (Any) –
Return type: Any

calibration¶

Functions and classes from calibration module provides easy functional to calibrate quantization thresholds in fake-quantized model.

calibrate_quantized_model(quantized_model, dataloader, n_steps=None, epochs=1, sample_to_model_inputs=<function default_sample_to_model_inputs>, verbose=0)¶

Calibrates all quantization thresholds in a quantized model.

Parameters

quantized_model (Any) – Model to calibrate quantization thresholds.
dataloader (torch.utils.data.DataLoader) – Dataloader which generates data that will be used to update model’s quantization thresholds.
n_steps (int or None, optional) – Number of total threshold calibration steps. Default value is None, which runs calibration on all dataloader images for the number of epochs in epochs argument.
epochs (int, optional) – Number of total threshold calibration epochs. Not used when n_steps argument is not None. Default value is 1.
sample_to_model_inputs (Callable, optional) – Function to map dataloader samples to model input format. Default value is default_sample_to_model_inputs(). See more in Converting dataloader items to PyTorch model inputs.
verbose (int, optional) – Procedure verbosity level. 0 disables all messages, 1 enables tqdm progress bar logging, 2 gives additional information about calibration. Default value is 0.

Notes

Before calling this function, your model should be prepared to be as close to practical inference usage as possible. For example, it is your responsibility to call eval method of your model if your inference requires calling this method (e.g. when the model contains dropout layers).

Typically, it is better to calibrate quantization thresholds on validation-like data without augmentations (but with inference input preprocessing).

Return type: None

class calibration_context(quantized_model)¶

Bases: contextlib.ContextDecorator

Context manager which enables and disables quantization threshold calibration procedure.

Within this context manager, calibration procedure is enabled in all FakeQuantization models. Exiting this context manager resets calibration and quantization flags in all layers to their initial values.

Parameters: quantized_model (Any) –

__init__(quantized_model)¶

Parameters: quantized_model (Any) – Fake-quantized model instance to calibrate.

distillation¶

Functions and classes from distillation module provides utilities and procedures for quantized model fine-tuning using knowledge distillation.

Helper functions for distillation:

class DistillationLayer¶

Bases: torch.nn.modules.module.Module

Base class for all distillation marker layers.

The main purpose of this class is to mark tensors which one wants to distill. This layer’s forward function takes exactly one input tensor and returns it.

Examples

>>> import torch
>>> from torch import nn
>>> from enot.distillation.distillation_layer import DistillationLayer
>>>
>>> student_model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
>>> features = student_model.features
>>> # Mark output of the first convolutional layer to distill.
>>> features[0] = nn.Sequential(*features[0], DistillationLayer())
>>> student_model.features = nn.Sequential(*features)

class QuantDistillationModule¶

Bases: enot.distillation.distillation_layer.DistillationLayer

Marker layer for quantization distillation.

To mark a tensor for distillation (i.e. tell enot quantization procedures that they should use this tensor for distillation), you should insert this module into your model, and wrap all tensors which you want to distill with this module’s call.

Notes

You can call this module multiple times for different tensors, and your model can have multiple QuantDistillationModule nested class instances.

Examples

>>> from torch import nn
>>> from enot.quantization.distillation.modules import QuantDistillationModule
>>>
>>> class MyModelLayer(nn.Module):
...     def __init__(self):
...         super().__init__()
...         self.conv1 = nn.Conv2d(4, 8, (3, 3))
...         self.conv2 = nn.Conv2d(8, 4, (3, 3))
...         self.distillation_module = QuantDistillationModule()
...     def forward(self, x):
...         conv1_out = self.conv1(x)
...         conv1_out = self.distillation_module(conv1_out)  # Marks conv1 output as distillation target.
...         conv2_out = self.conv2(x)
...         return conv2_out
...
>>> my_module = MyModelLayer()

class DistillationLayerSelectionStrategy(value)¶

Bases: enum.Enum

Layer selection strategy for quantization distillation procedure.

This strategy tells which layers will be used for distillation process during enot quantization.

Layer selection for distillation has been proven to be important, with some strategies being more robust in general while the others can provide better results in specific cases. For example, it is not a good idea to make distillation over detection network outputs as it produces unstable gradients. However, distillation over classification network outputs is widely used and provides good results in multiple scenarios.

The four available regimes are the following:

DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS
- Finds last quantized layers and distill over their outputs. Default option, robust to different scenarios including classification, segmentation, detection.
DistillationLayerSelectionStrategy.DISTILL_OUTPUTS
- Finds all PyTorch tensors in user model’s outputs and distill over them. Useful for classification problems with cross entropy loss (torch.nn.CrossEntropyLoss).
DistillationLayerSelectionStrategy.DISTILL_ALL_QUANT_LAYERS
- Finds all quantized layers and distill over their outputs. This is generally more robust to overfitting, but in practice converges worse for small number of distillation epochs.
DistillationLayerSelectionStrategy.DISTILL_USER_DEFINED_LAYERS
- Distillation over user-defined tensors in the model. User should wrap such tensors with QuantDistillationModule module call. For more information, see it’s documentation.

add_distillation_nodes_to_onnx_converted_model(traced_model, onnx_tensor_names)¶

Inserts distillation nodes (instances of QuantDistillationModule) after nodes defined by user.

Parameters

traced_model (torch.fx.GraphModule) – Model converted from ONNX to PyTorch using onnx2torch package. This model has to be converted by the convert() function with attach_onnx_mapping set to True.
onnx_tensor_names (list with str) – List with onnx tensor names from the original onnx model.

Return type

None

class distillation_context(quantized_model, layer_selection_strategy=DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS)¶

Bases: contextlib.ContextDecorator

Context manager which enables and disables distillation procedure.

Parameters

quantized_model (FakeQuantizedModel) –
layer_selection_strategy (DistillationLayerSelectionStrategy) –

We are also open-sourcing our distillation procedures to allow user to customize distillation procedure. All classes below are public, and you can view their source code by clicking on [source] links.

class RMSELoss(eps=1e-06)[source]¶

Bases: torch.nn.modules.module.Module

Parameters: eps (float) –

class DistillerInterface[source]¶: Distiller base interface.

class QuantizationDistiller(quantized_model, dataloader, optimizer=None, scheduler=None, distillation_layer_selection_strategy=DistillationLayerSelectionStrategy.DISTILL_LAST_QUANT_LAYERS, distillation_criterion='RMSELoss', n_epochs=1, device='cuda:0', sample_to_model_inputs=<function default_sample_to_model_inputs>, logdir=None, save_every=None, verbose=0)[source]¶

Bases: enot.quantization.distillation.quantization_distiller.DistillerInterface

Quantized model distillation class with a simple distillation implementation.

Parameters

quantized_model (FakeQuantizedModel) –
dataloader (DataLoader) –
optimizer (Optional[Optimizer]) –
scheduler (Optional[_LRScheduler]) –
distillation_layer_selection_strategy (DistillationLayerSelectionStrategy) –
distillation_criterion (Union[Module, str]) –
n_epochs (int) –
device (Union[str, device]) –
sample_to_model_inputs (Callable[[Any], Tuple[Tuple, Dict[str, Any]]]) –
logdir (Union[str, Path, None]) –
save_every (Optional[int]) –
verbose (int) –

class SequentialDistiller(*distillers)[source]¶

Bases: enot.quantization.distillation.quantization_distiller.DistillerInterface

Compound distillation class which performs sequential distillation with multiple strategies.

Parameters: distillers (DistillerInterface) –

class DefaultQuantizationDistiller(quantized_model, dataloader, device='cuda:0', sample_to_model_inputs=<function default_sample_to_model_inputs>, logdir=None, save_every=None, n_batches_calibrate=10, verbose=0)[source]¶

Bases: enot.quantization.distillation.quantization_distiller.SequentialDistiller

Wrapper over SequentialDistiller with a default well-performing distillation configuration.

Parameters

quantized_model (FakeQuantizedModel) –
dataloader (DataLoader) –
device (Union[str, device]) –
sample_to_model_inputs (Callable[[Any], Tuple[Tuple, Dict[str, Any]]]) –
logdir (Union[str, Path, None]) –
save_every (Optional[int]) –
n_batches_calibrate (int) –
verbose (int) –