Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
deepspeed
Commits
7925d0c3
Unverified
Commit
7925d0c3
authored
Mar 11, 2021
by
Stas Bekman
Committed by
GitHub
Mar 11, 2021
Browse files
small tweaks (#839)
parent
e0f36ed5
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
5 additions
and
5 deletions
+5
-5
docs/_tutorials/zero.md
docs/_tutorials/zero.md
+1
-1
docs/code-docs/source/zero3.rst
docs/code-docs/source/zero3.rst
+4
-4
No files found.
docs/_tutorials/zero.md
View file @
7925d0c3
...
...
@@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule):
#### Allocating Massive Megatron-LM Models
We make two further changes to model initalization in order to support models
that exceed
*local*
system memory, but not
not
*total*
system memory.
that exceed
*local*
system memory, but not
*total*
system memory.
1.
Allocate the model in a memory-scalable fashion. The model parameters will
be allocated and immediately partitioned across the data parallel group. If
...
...
docs/code-docs/source/zero3.rst
View file @
7925d0c3
...
...
@@ -21,13 +21,13 @@ Getting Started
If you are new to DeepSpeed, check out our `Getting Started <https://www.deepspeed.ai/getting-started/>`_ page.
Once you are training with DeepSpeed, enabling ZeRO-3
o
ffload is as simple as enabling it
Once you are training with DeepSpeed, enabling ZeRO-3
O
ffload is as simple as enabling it
in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see
our `config guide <https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`_
for a complete list of options for configuration and performance tuning.
.. note::
ZeRO-Offload works best with our heavily optimized
ZeRO-
3
Offload works best with our heavily optimized
:class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using
our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_
to instruct :meth:`deepspeed.initialize` to build the optimizer for you.
...
...
@@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para
Some
models
partitioned
with
:
class
:`
deepspeed
.
zero
.
Init
`
may
need
to
access
a
module
’
s
weights
outside
of
the
class
constructor
or
its
``
forward
()``
method
.
We
refer
to
these
weights
as
**
external
parameters
**,
since
the
y
parameters
are
accessed
outside
of
the
module
that
created
i
t
.
To
do
so
,
use
method
.
We
refer
to
these
weights
as
**
external
parameters
**,
since
the
se
parameters
are
accessed
outside
of
the
module
that
created
t
hem
.
To
do
so
,
use
:
class
:`
deepspeed
.
zero
.
GatheredParameters
`
or
:
meth
:`
deepspeed
.
zero
.
register_external_parameter
`.
..
autoclass
::
deepspeed
.
zero
.
GatheredParameters
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment