Unverified Commit 5d721e09 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

zero.Init() clarification (#880)



* zero.Init() clarification

clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must.

this proposal is via @samyam's clarification shared elsewhere.

Thank you.

* style

* add clarity

* style
Co-authored-by: default avatarOlatunji Ruwase <olruwase@microsoft.com>
parent c814abda
......@@ -279,6 +279,9 @@ class Init(InsertPostInitMethodToModuleSubClasses):
For example, if a node has 1TB of memory and 8 GPUs, we could fit a trillion
parameter model with 4 nodes and 32 GPUs.
Important: If the fp16 weights of the model can't fit onto a single GPU memory
this feature must be used.
.. note::
Initializes ``torch.distributed`` if it has not already been done so.
See :meth:`deepseed.init_distributed` for more information.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment