"""The DeepSpeed Autotuner automatically discovers the optimal DeepSpeed configuration that delivers good training speed. The Autotuner uses model information, system information, and heuristics to efficiently tune system knobs that affect compute and memory efficiencies, such as ZeRO optimization stages, micro-batch sizes, and many other ZeRO optimization configurations. It not only reduces the time and resources user spend on tuning, but also can discover configurations better than hand-tuned methods.
Autotuning with DeepSpeed requires no code change from DeepSpeed users. Please refer to the README for usage details.
f"Failed to create {self.results_dir}, please check `results_dir` in the autotuning config file is accessible by all the nodes in the job."
...
...
@@ -92,7 +96,8 @@ class Autotuner:
assertself.exp_num_gpus<=self.rm.num_gpus_per_node,"num_gpus in the autotuning configuration must not be less than the --num_gpus value in the train script if any"
assertself.exp_num_nodes<=len(
self.rm.nodes),"num_nodes in the autotuning configuration must not be less than the --num_nodes value in the train script if any"
self.rm.nodes
),"num_nodes in the autotuning configuration must not be less than the --num_nodes value in the train script if any"
f"The model might be runable with ZERO 0 (which requires at least {memory_to_string(required_gpu_mem,postfix='B')} memory with mbs = 1), adding DEFAULT_TUNING_SPACE_ZERO_0 to the global tuning space"
f"The model might be runable with ZERO 1 (which requires at least {memory_to_string(required_gpu_mem,postfix='B')} memory), adding DEFAULT_TUNING_SPACE_ZERO_1 to the global tuning space"
f"The model might be runable with ZERO 2 (which requires at least {memory_to_string(required_gpu_mem,postfix='B')} memory), adding DEFAULT_TUNING_SPACE_ZERO_2 to the global tuning space"
f"The model is not runable with ZERO stage {ZeroStageEnum.gradients} (which requires at least {memory_to_string(required_gpu_mem,postfix='B')} memory with mbs = 1)"
f"The model might be runable with ZERO 3 (which requires at least {memory_to_string(required_gpu_mem,postfix='B')} memory), adding DEFAULT_TUNING_SPACE_ZERO_3 to the global tuning space"