Optimization

class GTBaselineOptimizer(model, optimizer, *, allow_non_optimizer=False, **kwargs)

Recommended optimizer for baseline training and tuning after pruning. GTBaselineOptimizer allows to achieve better metrics in most cases.

During training, the metrics inside the closure may be measured several times, so to avoid this, you can use the following example of measurement.

Examples

>>> from enot.optimization import GTBaselineOptimizer

Create GTBaselineOptimizer and pass it a model and a base optimizer.

>>> opt = SGD(model.parameters())
>>> optimizer = GTBaselineOptimizer(model, opt)

Use GTBaselineOptimizer to do optimizer step and collect training statistics. Note that the metrics inside the closure may be measured several times, so to avoid this, you can use the following example of measurement.

>>> epoch_accuracy = 0
>>> for inputs, labels in dataloader:
>>>     step_accuracy = []
>>>
>>>     def closure():
>>>         optimizer.zero_grad()
>>>         outputs = model(inputs)
>>>         loss = criterion(outputs, labels)
>>>         loss.backward()
>>>
>>>         if len(step_accuracy) == 0:
>>>             accuracy = calculate_accuracy(outputs, labels)
>>>             step_accuracy.append(accuracy)
>>>
>>>         return loss
>>>
>>>     optimizer.step(closure)
>>>     epoch_accuracy += step_accuracy[0]
>>> epoch_accuracy /= len(dataloader)

Notes

Use this optimizer simultaneously in train and tune procedure or do not use at all. The performance of this optimizer is two times lower than performance of base optimizer, and memory consumption is 1.5 times higher than memory consumption of base optimizer.

__init__(model, optimizer, *, allow_non_optimizer=False, **kwargs)
Parameters:
add_param_group(param_group)

Call add_param_group of the wrapped optimizer.

Parameters:

param_group (dict with str keys) – Parameter group description to add to user optimizer.

Return type:

None

load_state_dict(state_dict)

Call load_state_dict of the wrapped optimizer.

Parameters:

state_dict (dict with str keys) – State dict to be loaded to user optimizer instance.

Return type:

None

property model: Module

Model passed to the constructor.

Returns:

PyTorch model passed to the optimizer constructor.

Return type:

torch.nn.Module

model_step(closure)

Perform gradient accumulation step.

Besides gradient computation, this method performs internal ENOT algorithms and utility configurations.

To accumulate gradients, this method must perform complete gradient computation cycle, which consists of forward step following by backward step. To achieve this, it requires user-defined closure, which encapsulates both of these steps.

It is usually enough to calculate model predictions, compute loss function by using model predictions, and then apply backprop algorithm to compute model parameter’s gradients by calling loss.backward().

In more sophisticated situations, you should contact ENOT team to make sure that in your situation it is possible to use our current API and that nothing would go wrong.

This function must be used in conjunction with the step() function.

Usually, you only need to call this function when you need gradient accumulation for multiple data batches. In this case, you should call model_step() for each data batch within your larger “ghost batch”. After accumulating gradients, you should call step() function without arguments.

Parameters:

closure (Callable[[], Union[float, torch.Tensor]]) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

property param_groups: List[Dict[str, Any]]

Returns param_groups variable of the wrapped optimizer.

Returns:

User optimizer parameter groups.

Return type:

list with dict with str keys

state_dict()

Call state_dict of the wrapped optimizer and return the result.

Returns:

User optimizer state dict.

Return type:

dict with str keys

step(closure=None, scaler=None)

Performs a single optimization step (parameter update).

Optimization step includes gradient computation (forward+backward passes, only when closure is not None) and parameter update. Parameter update is performed by base optimizer provided by user.

Besides gradient computation, this method performs internal ENOT algorithms and utility configurations.

Calling this function with a non-None closure argument is equivalent to calling model_step() with this closure followed by step() call without any argument.

More detailed description of gradient computation and closure structure can be found in model_step() function documentation.

Parameters:
  • closure (Callable[[], Union[float, torch.Tensor]] or None, optional) – A closure (nested function which has access to a free variable from an enclosing function) that performs complete gradient accumulation procedure. Must be None if you accumulated gradients using model_step(). Default value is None.

  • scaler (torch.cuda.amp.GradScaler or None, optional) – If specified use GradScaler for AMP optimization. Not work with multistep optimizers.

Returns:

The result of closure execution, which should be either a loss value stored in torch.Tensor or in float, or None.

Return type:

float or torch.Tensor or None

zero_grad(set_to_none=True)

Call zero_grad of the wrapped optimizer.

Parameters:

set_to_none (bool) –

Return type:

None