ENOT optimization package

enot.optimize package contains optimizer definitions for three main neural architecture search stages.

To optimize a neural network architecture, you should do the following:

  • Train the search space (pretrain)

  • Search for the optimal architecture (search)

  • Tune obtained model (tune)

build_optimizer(phase_name, model, optimizer, **options)

Builds and returns ENOT optimizer according to the given phase name.

Parameters:
  • phase_name ({"pretrain", "search", "train"}) – Name of the phase. Must be one of {“pretrain”, “search” or “train”}.

  • model (torch.nn.Module) – PyTorch model to optimize.

  • optimizer (torch.optim.Optimizer) – PyTorch optimizer to be replaced with the ENOT optimizer.

  • options – ENOT optimizer options.

Returns:

Suitable ENOT optimizer instance.

Return type:

BaseOptimizer

Raises:
  • ValueError – If got unknown phase name or unknown optimizer options.

  • TypeError – If model type is not suitable for the selected phase. Or, if optimizer instance is not a subclass of torch.optim.Optimizer.

Search space pre-training

Neural architecture search starts with the search space creation and its pretrain procedure for further selection of the best operations.

class PretrainOptimizer(search_space, optimizer, check_recommended_optimizations=True, **options)

Pptimizer for pretrain phase.

__init__(search_space, optimizer, check_recommended_optimizations=True, **options)
Parameters:
  • search_space (SearchSpaceModel) – Search space model to optimize.

  • optimizer (torch.optim.Optimizer) – PyTorch optimizer to be replaced with the Pretrain optimizer.

  • check_recommended_optimizations (bool, optional) – Whether to use recommended optimizations. Default value is True.

  • options – Other experimental options (should be ignored by user).

add_param_group(param_group)

Call add_param_group of the wrapped optimizer.

Parameters:

param_group (dict with str keys) – Parameter group description to add to user optimizer.

Return type:

None

load_state_dict(state_dict)

Call load_state_dict of the wrapped optimizer.

Parameters:

state_dict (dict with str keys) – State dict to be loaded to user optimizer instance.

Return type:

None

property model: Module

Model passed to the constructor.

Returns:

PyTorch model passed to the optimizer constructor.

Return type:

torch.nn.Module

model_step(closure)

Perform gradient accumulation step.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

To accumulate gradients, this method must perform complete gradient computation cycle, which consists of forward step following by backward step. To achieve this, it requires user-defined closure, which encapsulates both of these steps.

It is usually enough to calculate model predictions, compute loss function by using model predictions, and then apply backprop algorithm to compute model parameter’s gradients by calling loss.backward().

In more sophisticated situations, you should contact ENOT team to make sure that in your situation it is possible to use our current API and that nothing would go wrong.

This function must be used in conjunction with the step function.

Usually, you only need to call this function when you need gradient accumulation for multiple data batches. In this case, you should call model_step for each data batch within your larger “ghost batch”. After accumulating gradients, you should call step function without arguments.

Parameters:

closure (Callable) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

property search_space: SearchSpaceModel

Search space model passed to the constructor.

Returns:

Search space passed to the Enot optimizer constructor.

Return type:

SearchSpaceModel

state_dict()

Call state_dict of the wrapped optimizer and return the result.

Returns:

User optimizer state dict.

Return type:

dict with str keys

step(closure=None)

Performs a single optimization step (parameter update).

Optimization step includes gradient computation (forward+backward passes, only when closure is not None) and parameter update. Parameter update is performed by base optimizer provided by user.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

Calling this function with a non-None closure argument is equivalent to calling model_step with this closure followed by step call without any argument.

More detailed description of gradient computation and closure structure can be found in model_step function documentation.

Parameters:

closure (Callable or None, optional) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure. Must be None if you accumulated gradients using model_step. Default value is None.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

zero_grad(set_to_none=True)

Call zero_grad of the wrapped optimizer.

Parameters:

set_to_none (bool) –

Return type:

None

Optimal model selection

The pretrained search space is used to select the best combination of operations.

class SearchOptimizer(search_space, optimizer, bn_tune_dataloader=None, bn_tune_batches=10, bn_validation_tune_batches=50, sample_to_model_inputs=<function default_sample_to_model_inputs>, **options)

Optimizer for search phase with batch norm tuning.

__init__(search_space, optimizer, bn_tune_dataloader=None, bn_tune_batches=10, bn_validation_tune_batches=50, sample_to_model_inputs=<function default_sample_to_model_inputs>, **options)

Warning

If you use batch norm tuning, make sure to use the same batch norm tuning dataloader as you used during regular training. This ensures that your model will not have input data distribution shift which may cause performance degradation or misaligned search process.

Parameters:
  • search_space (torch.Tensor) – Search space model to optimize.

  • optimizer (torch.optim.Optimizer) – PyTorch optimizer which will be wrapped by our optimizer.

  • bn_tune_dataloader (torch.utils.data.DataLoader or None, optional) – Dataloader to tune batch norms during search and validation. It is important to use the same dataloader as you used during regular training. This ensures that your model will not have input data distribution shift which may cause performance degradation. Default value is None, which suppresses batch norm tuning procedure during search procedure.

  • bn_tune_batches (int, optional) – Number of steps (batches) to tune batch norm layers at each search step. Default value is 10.

  • bn_validation_tune_batches (int, optional) – Number of steps (batches) to tune batch norm layers before each validation run. Default value is 50.

  • sample_to_model_inputs (Callable, optional) – Function to map dataloader samples to model input format. Default value is default_sample_to_model_inputs(). See more here.

  • options – Other experimental options (should be ignored by user).

add_param_group(param_group)

Call add_param_group of the wrapped optimizer.

Parameters:

param_group (dict with str keys) – Parameter group description to add to user optimizer.

Return type:

None

property bn_tune_batches: int

The number of batch norm tune bathes for each search step.

Returns:

The number of batch norm tune bathes for each search step.

Return type:

int

property bn_validation_tune_batches: int

The number of batch norm tune bathes before validation.

Returns:

The number of batch norm tune bathes before validation.

Return type:

int

load_state_dict(state_dict)

Call load_state_dict of the wrapped optimizer.

Parameters:

state_dict (dict with str keys) – State dict to be loaded to user optimizer instance.

Return type:

None

property model: Module

Model passed to the constructor.

Returns:

PyTorch model passed to the optimizer constructor.

Return type:

torch.nn.Module

model_step(closure)

Perform gradient accumulation step.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

To accumulate gradients, this method must perform complete gradient computation cycle, which consists of forward step following by backward step. To achieve this, it requires user-defined closure, which encapsulates both of these steps.

It is usually enough to calculate model predictions, compute loss function by using model predictions, and then apply backprop algorithm to compute model parameter’s gradients by calling loss.backward().

In more sophisticated situations, you should contact ENOT team to make sure that in your situation it is possible to use our current API and that nothing would go wrong.

This function must be used in conjunction with the step function.

Usually, you only need to call this function when you need gradient accumulation for multiple data batches. In this case, you should call model_step for each data batch within your larger “ghost batch”. After accumulating gradients, you should call step function without arguments.

Parameters:

closure (Callable) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

prepare_validation_model(architecture_indices=None)

Prepares search space for validation.

This function prepares search space for validation. Specifically, it does the following:

  1. Samples current best architecture or user-defined architecture.

  2. Optionally optimizes it.

Warning

It is your responsibility to call this function before running search space evaluation procedure.

Warning

It is not desired to change sampled architecture in the search space until your validation process is finished.

Parameters:

architecture_indices (list of int or None, optional) – Custom architecture to use in validation. Default value is None, in which case the current best architecture is sampled for validation.

Returns:

Architecture selected for validation.

Return type:

list of int

property search_space: SearchSpaceModel

Search space model passed to the constructor.

Returns:

Search space passed to the Enot optimizer constructor.

Return type:

SearchSpaceModel

state_dict()

Call state_dict of the wrapped optimizer and return the result.

Returns:

User optimizer state dict.

Return type:

dict with str keys

step(closure=None)

Performs a single optimization step (parameter update).

Optimization step includes gradient computation (forward+backward passes, only when closure is not None) and parameter update. Parameter update is performed by base optimizer provided by user.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

Calling this function with a non-None closure argument is equivalent to calling model_step with this closure followed by step call without any argument.

More detailed description of gradient computation and closure structure can be found in model_step function documentation.

Parameters:

closure (Callable or None, optional) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure. Must be None if you accumulated gradients using model_step. Default value is None.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

zero_grad(set_to_none=True)

Call zero_grad of the wrapped optimizer.

Parameters:

set_to_none (bool) –

Return type:

None

class FixedLatencySearchOptimizer(search_space, optimizer, max_latency_value, bn_tune_dataloader=None, bn_tune_batches=10, bn_validation_tune_batches=50, sample_to_model_inputs=<function default_sample_to_model_inputs>, **options)

Optimizer for search phase with latency upper boundary.

__init__(search_space, optimizer, max_latency_value, bn_tune_dataloader=None, bn_tune_batches=10, bn_validation_tune_batches=50, sample_to_model_inputs=<function default_sample_to_model_inputs>, **options)

Warning

If you use batch norm tuning, make sure to use the same batch norm tuning dataloader as you used during regular training. This ensures that your model will not have input data distribution shift which may cause performance degradation or misaligned search process.

Parameters:
  • search_space (SearchSpaceModel) – Search space model to optimize.

  • optimizer (Optimizer) – PyTorch optimizer to be replaced with our optimizer.

  • max_latency_value (float) – Maximal latency for search phase. Search process is constrained with this value so all architectures sampled for validation will have latency less than or equal to this value.

  • bn_tune_dataloader (torch.utils.data.DataLoader or None, optional) – Dataloader to tune batch norms during search and validation. It is important to use the same dataloader as you used during regular training. This ensures that your model will not have input data distribution shift which may cause performance degradation. Default value is None, which suppresses batch norm tuning procedure during search procedure.

  • bn_tune_batches (int, optional) – Number of steps (batches) to tune batch norm layers at each search step. Default value is 10.

  • bn_validation_tune_batches (int, optional) – Number of steps (batches) to tune batch norm layers before each validation run. Default value is 50.

  • sample_to_model_inputs (Callable, optional) – Function to map dataloader samples to model input format. Default value is default_sample_to_model_inputs(). See more here.

  • options – Other experimental options (should be ignored by user).

add_param_group(param_group)

Call add_param_group of the wrapped optimizer.

Parameters:

param_group (dict with str keys) – Parameter group description to add to user optimizer.

Return type:

None

property bn_tune_batches: int

The number of batch norm tune bathes for each search step.

Returns:

The number of batch norm tune bathes for each search step.

Return type:

int

property bn_validation_tune_batches: int

The number of batch norm tune bathes before validation.

Returns:

The number of batch norm tune bathes before validation.

Return type:

int

load_state_dict(state_dict)

Call load_state_dict of the wrapped optimizer.

Parameters:

state_dict (dict with str keys) – State dict to be loaded to user optimizer instance.

Return type:

None

property max_latency_value: float

Maximal latency value (upper bound) for search process.

Returns:

Maximal latency value (upper bound) for search process.

Return type:

float

property model: Module

Model passed to the constructor.

Returns:

PyTorch model passed to the optimizer constructor.

Return type:

torch.nn.Module

model_step(closure)

Perform gradient accumulation step.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

To accumulate gradients, this method must perform complete gradient computation cycle, which consists of forward step following by backward step. To achieve this, it requires user-defined closure, which encapsulates both of these steps.

It is usually enough to calculate model predictions, compute loss function by using model predictions, and then apply backprop algorithm to compute model parameter’s gradients by calling loss.backward().

In more sophisticated situations, you should contact ENOT team to make sure that in your situation it is possible to use our current API and that nothing would go wrong.

This function must be used in conjunction with the step function.

Usually, you only need to call this function when you need gradient accumulation for multiple data batches. In this case, you should call model_step for each data batch within your larger “ghost batch”. After accumulating gradients, you should call step function without arguments.

Parameters:

closure (Callable) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

prepare_validation_model(architecture_indices=None)

Prepares search space for validation.

This function prepares search space for validation. Specifically, it does the following:

  1. Samples current best architecture or user-defined architecture.

  2. Optionally optimizes it.

Warning

It is your responsibility to call this function before running search space evaluation procedure.

Warning

It is not desired to change sampled architecture in the search space until your validation process is finished.

Parameters:

architecture_indices (list of int or None, optional) – Custom architecture to use in validation. Default value is None, in which case the current best architecture is sampled for validation.

Returns:

Architecture selected for validation.

Return type:

list of int

property search_space: SearchSpaceModel

Search space model passed to the constructor.

Returns:

Search space passed to the Enot optimizer constructor.

Return type:

SearchSpaceModel

state_dict()

Call state_dict of the wrapped optimizer and return the result.

Returns:

User optimizer state dict.

Return type:

dict with str keys

step(closure=None)

Performs a single optimization step (parameter update).

Optimization step includes gradient computation (forward+backward passes, only when closure is not None) and parameter update. Parameter update is performed by base optimizer provided by user.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

Calling this function with a non-None closure argument is equivalent to calling model_step with this closure followed by step call without any argument.

More detailed description of gradient computation and closure structure can be found in model_step function documentation.

Parameters:

closure (Callable or None, optional) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure. Must be None if you accumulated gradients using model_step. Default value is None.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

zero_grad(set_to_none=True)

Call zero_grad of the wrapped optimizer.

Parameters:

set_to_none (bool) –

Return type:

None

Optimal model tuning

Final model is tuned for some epochs to match it’s standalone accuracy. Sometimes, however, it is necessary to re-train the obtained architecture from scratch.

class TrainOptimizer(model, optimizer, **options)

ENOT optimizer for train and tune phases.

__init__(model, optimizer, **options)
Parameters:
  • model (torch.nn.Module) – PyTorch model to optimize.

  • optimizer (torch.optim.Optimizer) – PyTorch optimizer which will be wrapped by ENOT optimizer.

  • options – Experimental options (should be ignored by user).

add_param_group(param_group)

Call add_param_group of the wrapped optimizer.

Parameters:

param_group (dict with str keys) – Parameter group description to add to user optimizer.

Return type:

None

load_state_dict(state_dict)

Call load_state_dict of the wrapped optimizer.

Parameters:

state_dict (dict with str keys) – State dict to be loaded to user optimizer instance.

Return type:

None

property model: Module

Model passed to the constructor.

Returns:

PyTorch model passed to the optimizer constructor.

Return type:

torch.nn.Module

model_step(closure)

Perform gradient accumulation step.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

To accumulate gradients, this method must perform complete gradient computation cycle, which consists of forward step following by backward step. To achieve this, it requires user-defined closure, which encapsulates both of these steps.

It is usually enough to calculate model predictions, compute loss function by using model predictions, and then apply backprop algorithm to compute model parameter’s gradients by calling loss.backward().

In more sophisticated situations, you should contact ENOT team to make sure that in your situation it is possible to use our current API and that nothing would go wrong.

This function must be used in conjunction with the step function.

Usually, you only need to call this function when you need gradient accumulation for multiple data batches. In this case, you should call model_step for each data batch within your larger “ghost batch”. After accumulating gradients, you should call step function without arguments.

Parameters:

closure (Callable) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

state_dict()

Call state_dict of the wrapped optimizer and return the result.

Returns:

User optimizer state dict.

Return type:

dict with str keys

step(closure=None)

Performs a single optimization step (parameter update).

Optimization step includes gradient computation (forward+backward passes, only when closure is not None) and parameter update. Parameter update is performed by base optimizer provided by user.

Besides gradient computation, this method performs ENOT internal ENOT algorithms and utility configurations.

Calling this function with a non-None closure argument is equivalent to calling model_step with this closure followed by step call without any argument.

More detailed description of gradient computation and closure structure can be found in model_step function documentation.

Parameters:

closure (Callable or None, optional) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure. Must be None if you accumulated gradients using model_step. Default value is None.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

zero_grad(set_to_none=True)

Call zero_grad of the wrapped optimizer.

Parameters:

set_to_none (bool) –

Return type:

None