Enot distributed package¶
enot.distributed package contains distributed training utilities. If you use distributed training with ENOT framework, it is recommended to use this package’s functional.
Initialization¶
- init_process_group(backend='nccl', init_method='env://', timeout=datetime.timedelta(seconds=1800))¶
Creates default distributed process group if necessary and initializes enot.distributed package.
Process group initialization relies on environment variables. Specifically, we require two environment variables to be set: “WORLD_SIZE” and “LOCAL_RANK”. They are set automatically by PyTorch launching scripts such as torch.distributed.launch and torch.distributed.run. If you are using your own distributed training launch script - then set these environment variables in the same way as in the aforementioned scripts.
Distributed process group is initialized when “WORLD_SIZE” environment variable is larger than 1. Before that we perform a number of checks, and eventually call torch.distributed.init_process_group function to initialize distributed process group.
In distributed setting with nccl backend, additionally sets the default gpu device index to process local rank and initializes cuda.
- Parameters
backend (str, optional) – “backend” argument of torch.distributed.init_process_group function. Default value is “nccl”.
init_method (str, optional) – “init_method” argument of torch.distributed.init_process_group function. Default value is “env://”.
timeout (timedelta, optional) – “timeout” argument of torch.distributed.init_process_group function. Default value depends on torch.distributed package implementation (is equal to 30 minutes in most cases).
- Raises
DistributedRuntimeException – On process group initialization errors.
- Return type
- init_torch(seed=None, cuda_optimize_for_speed=False, cuda_reproducible=False, backend='nccl', init_method='env://', timeout=datetime.timedelta(seconds=1800))¶
Initializes PyTorch framework for training in both single- and multi-process setup.
This function does the following:
prepares seed for experiment, if seed is specified;
specifies certain cuda flags when
cuda_optimize_for_speed
orcuda_reproducible
are set to True;initializes distributed process group for multi-gpu (multi-process) training.
When specified, seed is set in Python’s random library, in NumPy’s random library, and in PyTorch.
Process group initialization relies on environment variables. Specifically, we require two environment variables to be set: “WORLD_SIZE” and “LOCAL_RANK”. They are set automatically by PyTorch launching scripts such as torch.distributed.launch and torch.distributed.run. If you are using your own distributed training launch script - then set these environment variables in the same way as in the aforementioned scripts.
Distributed process group is initialized when “WORLD_SIZE” environment variable is larger than 1. Before that we perform a number of checks, and eventually call torch.distributed.init_process_group function to initialize distributed process group.
In distributed setting with nccl backend, additionally sets the default gpu device index to process local rank and initializes cuda.
- Parameters
seed (int or None, optional) – Random seed for all randomizers. Defaults to None, which disables manual seeding.
cuda_optimize_for_speed (bool, optional) – Whether to set cuda-related optimization flags to enhance training speed. Mutually exclusive with
cuda_reproducible
flag. Makes training non-deterministic (non-reproducible runs even when the same seed is specified). Default value is False.cuda_reproducible (bool, optional) – Whether to set cuda-related optimization flags to make training runs reproducible. Mutually exclusive with
cuda_optimize_for_speed
flag. May slightly decrease training speed. Default value is False.backend (str, optional) – “backend” argument of torch.distributed.init_process_group function. Default value is “nccl”.
init_method (str, optional) – “init_method” argument of torch.distributed.init_process_group function. Default value is “env://”.
timeout (timedelta, optional) – “timeout” argument of torch.distributed.init_process_group function. Default value depends on torch.distributed package implementation (is equal to 30 minutes in most cases).
- Raises
RuntimeError – When both
cuda_optimize_for_speed
andcuda_reproducible
flags are set.DistributedRuntimeException – On process group initialization errors.
- Return type
Common distributed utilities¶
- get_world_size()¶
Returns the number of parallel PyTorch processes in multi-process training.
- Returns
Number of processes.
- Return type
- get_local_rank()¶
Returns process rank within a single node.
- Returns
Local process rank.
- Return type
- is_dist()¶
Checks if distributed mode is enabled.
- Returns
Whether torch.distributed is initialized and world size is bigger than 1.
- Return type
- is_master()¶
Checks whether the current process is the global master in torch.distributed processes group or not.
- Returns
Whether the current process is the global master process.
- Return type
- is_local_master()¶
Checks whether the current process is a local node master in torch.distributed processes group or not.
- Returns
Whether the current process is a local master process.
- Return type
- master_only(fn)¶
Decorator which makes a procedure run only in global master process.
- Parameters
fn (callable which returns None) – Function to call only in master process.
- Returns
Function wrapped with
is_master()
.- Return type
callable
Model synchronization¶
- sync_model(model, sync_parameters=True, sync_buffers=True, reduce_parameters=True, reduce_buffers=True)¶
Synchronizes model parameters and buffers across distributed processes.
After executing this function, user should have identical copies of his model on all distributed worker processes.
- Parameters
model (torch.nn.Module) – Model which parameters will be synchronized.
sync_parameters (bool, optional) – Whether to synchronize model parameters or not. Default value is True.
sync_buffers (bool, optional) – Whether to synchronize model buffers or not. Default value is True.
reduce_parameters (bool, optional) – Whether to use reduction (cross-process averaging) to synchronize model parameters or not. When set to False, broadcasts parameters from master process to all other processes instead. Default value is True.
reduce_buffers (bool, optional) – Whether to use reduction (cross-process averaging) to synchronize model buffers or not. When set to False, broadcasts parameters from master process to all other processes instead. Default value is True.
- Return type