placement_policy (str, optional): "cpu", "cuda", "auto". Defaults to "cpu".
chunk_init_device (torch.device, optional): device to initialize the chunk.
placement_policy (str, optional): "static" and "auto". Defaults to "static".
shard_param_frac (float, optional): fraction of parameters to be sharded. Only for "static" placement.
If `shard_param_frac` is 1.0, it's equal to zero-3. If `shard_param_frac` is 0.0, it's equal to zero-2. Defaults to 1.0.
offload_optim_frac (float, optional): fraction of optimizer states to be offloaded. Only for "static" placement.
If `shard_param_frac` is 1.0 and `offload_optim_frac` is 0.0, it's equal to old "cuda" placement. Defaults to 0.0.
offload_param_frac (float, optional): fraction of parameters to be offloaded. Only for "static" placement.
For efficiency, this argument is useful only when `shard_param_frac` is 1.0 and `offload_optim_frac` is 1.0.
If `shard_param_frac` is 1.0, `offload_optim_frac` is 1.0 and `offload_param_frac` is 1.0, it's equal to old "cpu" placement.
When using static placement, we recommend users to tune `shard_param_frac` first and then `offload_optim_frac`.
Defaults to 0.0.
warmup_non_model_data_ratio (float, optional): ratio of expected non-model data memory during warmup. Only for "auto" placement. Defaults to 0.8.
steady_cuda_cap_ratio (float, optional): ratio of allowed cuda capacity for model data during steady state. Only for "auto" placement. Defaults to 0.9.
precision (str, optional): precision. Support 'fp16' and 'bf16'. Defaults to 'fp16'.
precision (str, optional): precision. Support 'fp16' and 'bf16'. Defaults to 'fp16'.
pin_memory (bool, optional): use pin memory on CPU. Defaults to False.
pin_memory (bool, optional): use pin memory on CPU. Defaults to False.
force_outputs_fp32 (bool, optional): force outputs are fp32. Defaults to False.
force_outputs_fp32 (bool, optional): force outputs are fp32. Defaults to False.
...
@@ -312,8 +271,14 @@ class GeminiPlugin(DPPluginBase):
...
@@ -312,8 +271,14 @@ class GeminiPlugin(DPPluginBase):
def__init__(
def__init__(
self,
self,
device:Optional[torch.device]=None,
chunk_config_dict:Optional[dict]=None,
placement_policy:str="cpu",
chunk_init_device:Optional[torch.device]=None,
placement_policy:str="static",
shard_param_frac:float=1.0,# only for static placement
offload_optim_frac:float=0.0,# only for static placement
offload_param_frac:float=0.0,# only for static placement
warmup_non_model_data_ratio:float=0.8,# only for auto placement
steady_cuda_cap_ratio:float=0.9,# only for auto placement
precision:str="fp16",
precision:str="fp16",
pin_memory:bool=False,
pin_memory:bool=False,
force_outputs_fp32:bool=False,
force_outputs_fp32:bool=False,
...
@@ -337,8 +302,14 @@ class GeminiPlugin(DPPluginBase):
...
@@ -337,8 +302,14 @@ class GeminiPlugin(DPPluginBase):
super().__init__()
super().__init__()
assertprecisioninSUPPORTED_PRECISION,f'precision {precision} is not supported'
assertprecisioninSUPPORTED_PRECISION,f'precision {precision} is not supported'