'''Given a directory to store the checkpoints, saves all the training components' parameters or buffers, such as model, optimizer, lr_scheduler and etc. into a checkpoint dictionary.
"""Given a directory to store the checkpoints, saves all the training components' parameters or buffers, such as model, optimizer, lr_scheduler and etc. into a checkpoint dictionary.
This method can be used for both colosalai nn.BaseModel and normal pytorch nn.Module.
This method can be used for both colosalai nn.BaseModel and normal pytorch nn.Module.
:param checkpoint_path: set up a directory for saving checkpoints
:param checkpoint_path: Set up a directory for saving checkpoints
:type checkpoint_path: str
:type checkpoint_path: str
:param epoch: epoch number (indicate how many epochs have you trained this model)
:param epoch: Epoch number (indicate how many epochs have you trained this model)
:type epoch: int
:type epoch: int
:param model: model to be registered
:param model: Model to be registered
:type model: torch.nn.Module
:type model: torch.nn.Module
:param optimizer: optimizer to be registered
:param optimizer: Optimizer to be registered
:type optimizer: torch.optim.Optimizer
:type optimizer: torch.optim.Optimizer
:param lr_scheduler: lr_scheduler to be registered, defaults to None
:param lr_scheduler: lr_scheduler to be registered, defaults to None
"""A data sampler for distributed data parallelism
"""A data sampler for distributed data parallelism
:param dataset: a Dataset instance
:param dataset: A Dataset instance
:type dataset: torch.utils.data.Dataset
:type dataset: torch.utils.data.Dataset
:param shuffle: whether to shuffle data, defaults to False
:param shuffle: Whether to shuffle data, defaults to False
:type shuffle: bool, optional
:type shuffle: bool, optional
:param seed: the random seed, defaults to 0
:param seed: The random seed, defaults to 0
:type seed: int, optional
:type seed: int, optional
:param drop_last: set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller, defaults to False
:param drop_last: Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch
size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller,
defaults to False
:type drop_last: bool, optional
:type drop_last: bool, optional
"""
"""
...
@@ -116,19 +118,18 @@ def get_dataloader(dataset,
...
@@ -116,19 +118,18 @@ def get_dataloader(dataset,
pin_memory=False,
pin_memory=False,
num_workers=0,
num_workers=0,
**kwargs):
**kwargs):
'''Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)
"""Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)
.. note: when pipeline parallel is enabled, shuffle cannot be True
.. note:: When pipeline parallel is enabled, shuffle cannot be True as it will result in mismatch between input data
as it will result in mismatch between input data on the 1st
on the 1st stage and label on the last stage
stage and label on the last stage
:param dataset: a :class:utils.data.dataset dataset
:param dataset: A :class:`utils.data.dataset dataset`
:param shuffle: whether to shuffle the dataset
:param shuffle: Whether to shuffle the dataset
:param seed: random worker seed, defaults to 1024
:param seed: Random worker seed, defaults to 1024
:param add_sampler: add DistributedDataParallelSampelr to the dataset
:param add_sampler: Add DistributedDataParallelSampelr to the dataset
:param drop_last: drop the last incomplete batch of data
:param drop_last: Drop the last incomplete batch of data
:param pin_memory: whether to pin memory address in CPU memory
:param pin_memory: Whether to pin memory address in CPU memory
:param num_workers: number of worker threads for this dataloader
:param num_workers: Number of worker threads for this dataloader
:type dataset: :class:`torch.utils.data.Dataset`
:type dataset: :class:`torch.utils.data.Dataset`
:type shuffle: bool, optional. Default is False
:type shuffle: bool, optional. Default is False
...
@@ -138,9 +139,9 @@ def get_dataloader(dataset,
...
@@ -138,9 +139,9 @@ def get_dataloader(dataset,
:type pin_memory: bool, optional. Default is False
:type pin_memory: bool, optional. Default is False
:type num_workers: int, optional. Default is 0
:type num_workers: int, optional. Default is 0
:return: a object of :class:`torch.utils.data.DataLoader`
:return: A object of :class:`torch.utils.data.DataLoader`
@@ -95,8 +95,11 @@ class DynamicLossScaler(LossScalerBase):
...
@@ -95,8 +95,11 @@ class DynamicLossScaler(LossScalerBase):
always using the highest loss scale possible without incurring overflow.
always using the highest loss scale possible without incurring overflow.
Args:
Args:
init_scale (float, optional, default=2**32): Initial loss scale attempted by :class:`DynamicLossScaler.`
init_scale (float, optional, default=2**32): Initial loss scale attempted by :class:`DynamicLossScaler.`
scale_factor (float, optional, default=2.0): Factor used when adjusting the loss scale. If an overflow is encountered, the loss scale is readjusted to loss scale/``scale_factor``. If ``scale_window`` consecutive iterations take place without an overflow, the loss scale is readjusted to loss_scale*``scale_factor``.
scale_factor (float, optional, default=2.0): Factor used when adjusting the loss scale. If an overflow is
scale_window (int, optional, default=1000): Number of consecutive iterations without an overflow to wait before increasing the loss scale.
encountered, the loss scale is readjusted to loss scale/``scale_factor``. If ``scale_window`` consecutive
iterations take place without an overflow, the loss scale is readjusted to loss_scale*``scale_factor``.
scale_window (int, optional, default=1000): Number of consecutive iterations without an overflow to wait before