small tweaks (#839)

7925d0c3 · Stas Bekman · GitHub · e0f36ed5 · 7925d0c3 · 7925d0c3
Unverified Commit 7925d0c3 authored Mar 11, 2021 by Stas Bekman Committed by GitHub Mar 11, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 5 deletions

docs/_tutorials/zero.md docs/_tutorials/zero.md +1 -1

docs/code-docs/source/zero3.rst docs/code-docs/source/zero3.rst +4 -4

No files found.
--- a/docs/_tutorials/zero.md
+++ b/docs/_tutorials/zero.md
@@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule):
 #### Allocating Massive Megatron-LM Models

 We make two further changes to model initalization in order to support models
-that exceed *local* system memory, but not not *total* system memory.
+that exceed *local* system memory, but not *total* system memory.

 1. Allocate the model in a memory-scalable fashion. The model parameters will
 be allocated and immediately partitioned across the data parallel group. If

--- a/docs/code-docs/source/zero3.rst
+++ b/docs/code-docs/source/zero3.rst
@@ -21,13 +21,13 @@ Getting Started

 If you are new to DeepSpeed, check out our `Getting Started <https://www.deepspeed.ai/getting-started/>`_ page.

-Once you are training with DeepSpeed, enabling ZeRO-3 offload is as simple as enabling it
+Once you are training with DeepSpeed, enabling ZeRO-3 Offload is as simple as enabling it
 in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see
 our `config guide <https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`_
 for a complete list of options for configuration and performance tuning.

 .. note::
-        ZeRO-Offload works best with our heavily optimized
+        ZeRO-3 Offload works best with our heavily optimized
        :class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using
        our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_
        to instruct :meth:`deepspeed.initialize` to build the optimizer for you.
@@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para

 Some models partitioned with :class:`deepspeed.zero.Init` may need to access
 a module’s weights outside of the class constructor or its ``forward()``
-method. We refer to these weights as **external parameters**, since they
-parameters are accessed outside of the module that created it. To do so, use
+method. We refer to these weights as **external parameters**, since these
+parameters are accessed outside of the module that created them. To do so, use
 :class:`deepspeed.zero.GatheredParameters` or :meth:`deepspeed.zero.register_external_parameter`.

 .. autoclass:: deepspeed.zero.GatheredParameters