zero.Init() clarification (#880)

* zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

zero.Init() clarification (#880)
* zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
5d721e09 · Stas Bekman · GitHub · c814abda · 5d721e09
Unverified Commit 5d721e09 authored Apr 01, 2021 by Stas Bekman Committed by GitHub Apr 01, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 0 deletions

deepspeed/runtime/zero/partition_parameters.py deepspeed/runtime/zero/partition_parameters.py +3 -0

No files found.
--- a/deepspeed/runtime/zero/partition_parameters.py
+++ b/deepspeed/runtime/zero/partition_parameters.py
@@ -279,6 +279,9 @@ class Init(InsertPostInitMethodToModuleSubClasses):
        For example, if a node has 1TB of memory and 8 GPUs, we could fit a trillion
        parameter model with 4 nodes and 32 GPUs.
+        Important: If the fp16 weights of the model can't fit onto a single GPU memory
+        this feature must be used.
        .. note::
            Initializes ``torch.distributed`` if it has not already been done so.
            See :meth:`deepseed.init_distributed` for more information.