"git@developer.sourcefind.cn:OpenDAS/vision.git" did not exist on "49468279d9070a5631b6e0198ee562c00ecedb10"
Unverified Commit 7925d0c3 authored by Stas Bekman's avatar Stas Bekman Committed by GitHub
Browse files

small tweaks (#839)

parent e0f36ed5
...@@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule): ...@@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule):
#### Allocating Massive Megatron-LM Models #### Allocating Massive Megatron-LM Models
We make two further changes to model initalization in order to support models We make two further changes to model initalization in order to support models
that exceed *local* system memory, but not not *total* system memory. that exceed *local* system memory, but not *total* system memory.
1. Allocate the model in a memory-scalable fashion. The model parameters will 1. Allocate the model in a memory-scalable fashion. The model parameters will
be allocated and immediately partitioned across the data parallel group. If be allocated and immediately partitioned across the data parallel group. If
......
...@@ -21,13 +21,13 @@ Getting Started ...@@ -21,13 +21,13 @@ Getting Started
If you are new to DeepSpeed, check out our `Getting Started <https://www.deepspeed.ai/getting-started/>`_ page. If you are new to DeepSpeed, check out our `Getting Started <https://www.deepspeed.ai/getting-started/>`_ page.
Once you are training with DeepSpeed, enabling ZeRO-3 offload is as simple as enabling it Once you are training with DeepSpeed, enabling ZeRO-3 Offload is as simple as enabling it
in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see
our `config guide <https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`_ our `config guide <https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`_
for a complete list of options for configuration and performance tuning. for a complete list of options for configuration and performance tuning.
.. note:: .. note::
ZeRO-Offload works best with our heavily optimized ZeRO-3 Offload works best with our heavily optimized
:class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using :class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using
our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_ our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_
to instruct :meth:`deepspeed.initialize` to build the optimizer for you. to instruct :meth:`deepspeed.initialize` to build the optimizer for you.
...@@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para ...@@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para
Some models partitioned with :class:`deepspeed.zero.Init` may need to access Some models partitioned with :class:`deepspeed.zero.Init` may need to access
a modules weights outside of the class constructor or its ``forward()`` a modules weights outside of the class constructor or its ``forward()``
method. We refer to these weights as **external parameters**, since they method. We refer to these weights as **external parameters**, since these
parameters are accessed outside of the module that created it. To do so, use parameters are accessed outside of the module that created them. To do so, use
:class:`deepspeed.zero.GatheredParameters` or :meth:`deepspeed.zero.register_external_parameter`. :class:`deepspeed.zero.GatheredParameters` or :meth:`deepspeed.zero.register_external_parameter`.
.. autoclass:: deepspeed.zero.GatheredParameters .. autoclass:: deepspeed.zero.GatheredParameters
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment