Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
deepspeed
Commits
6bb5c69f
Unverified
Commit
6bb5c69f
authored
Sep 10, 2020
by
Shaden Smith
Committed by
GitHub
Sep 10, 2020
Browse files
Website edits (#398)
Co-authored-by:
Jeff Rasley
<
jerasley@microsoft.com
>
parent
7baf3c3a
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
20 additions
and
13 deletions
+20
-13
README.md
README.md
+2
-2
docs/_pages/features.md
docs/_pages/features.md
+15
-8
docs/_posts/2020-09-09-ZeRO-Offload.md
docs/_posts/2020-09-09-ZeRO-Offload.md
+1
-1
docs/index.md
docs/index.md
+2
-2
No files found.
README.md
View file @
6bb5c69f
docs/_pages/features.md
View file @
6bb5c69f
...
@@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \
...
@@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \
```
```
The script
`<client_entry.py>`
will execute on the resources specified in
`<hostfile>`
.
The script
`<client_entry.py>`
will execute on the resources specified in
`<hostfile>`
.
## Pipeline Parallelism
DeepSpeed provides
[
pipeline parallelism
](
/tutorials/pipeline/
)
for memory-
and communication- efficient training. DeepSpeed supports a hybrid
combination of data, model, and pipeline parallelism and has scaled to over
[
one trillion parameters using 3D parallelism
](
{{
site.press_release_v3 }}).
Pipeline parallelism can also improve communication efficiency and has
accelerated training by up to 7x on low-banwdith clusters.
## Model Parallelism
## Model Parallelism
### Support for Custom Model Parallelism
### Support for Custom Model Parallelism
DeepSpeed supports all forms of model parallelism including tensor slicing based
DeepSpeed supports all forms of model parallelism including tensor slicing
approaches such as the
[
Megatron-LM
](
https://github.com/NVIDIA/Megatron-LM
)
, or
based approaches such as the
pipelined parallelism approaches such as
[
Megatron-LM
](
https://github.com/NVIDIA/Megatron-LM
)
. It does so by only
[
PipeDream
](
https://github.com/msr-fiddle/pipedream
)
and
requiring the model parallelism framework to provide a
*
model parallelism
[
GPipe
](
https://github.com/kakaobrain/torchgpipe
)
. It does so by only requiring the model
unit
*
(
`mpu`
) that implements a few bookkeeping functionalities:
parallelism framework to provide a
*model parallelism unit*
(
`mpu`
) that implements a few
bookkeeping functionalities:
```
python
```
python
mpu
.
get_model_parallel_rank
()
mpu
.
get_model_parallel_rank
()
...
@@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron
...
@@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron
Please see the
[
Megatron-LM tutorial
](
/tutorials/megatron/
)
for details.
Please see the
[
Megatron-LM tutorial
](
/tutorials/megatron/
)
for details.
## The Zero Redundancy Optimizer
## The Zero Redundancy Optimizer
The Zero Redundancy Optimizer (
[
ZeRO
](
https://arxiv.org/abs/1910.02054
)
) is at
The Zero Redundancy Optimizer (
[
ZeRO
](
https://arxiv.org/abs/1910.02054
)
) is at
the heart of DeepSpeed and enables large model training at a scale that is
the heart of DeepSpeed and enables large model training at a scale that is
...
...
docs/_posts/2020-09-09-ZeRO-Offload.md
View file @
6bb5c69f
docs/index.md
View file @
6bb5c69f
...
@@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
...
@@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
# What's New?
# What's New?
*
[
2020/09/10] [DeepSpeed: Extreme-scale model training for everyone
](
{{
site.press_release_v3 }})
*
[
2020/09/10] [DeepSpeed: Extreme-scale model training for everyone
](
{{
site.press_release_v3 }})
*
[
Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention
](
https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html
)
*
[
Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention
](
https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html
)
*
[
Training a trillion parameters with pipeline parallelism
](
https://www.deepspeed.ai/news/2020/09/0
9
/pipeline-parallelism.html
)
*
[
Training a trillion parameters with pipeline parallelism
](
https://www.deepspeed.ai/news/2020/09/0
8
/pipeline-parallelism.html
)
*
[
Up to 5x less communication and 3.4x faster training through 1-bit Adam
](
https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html
)
*
[
Up to 5x less communication and 3.4x faster training through 1-bit Adam
](
https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html
)
*
[
10x bigger model training on a single GPU with ZeRO-Offload
](
https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html
)
*
[
10x bigger model training on a single GPU with ZeRO-Offload
](
https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment