Website edits (#398)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Website edits (#398)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
6bb5c69f · Shaden Smith · GitHub · 7baf3c3a · 6bb5c69f · 6bb5c69f
Unverified Commit 6bb5c69f authored Sep 10, 2020 by Shaden Smith Committed by GitHub Sep 10, 2020
Showing with 20 additions and 13 deletions

README.md README.md +2 -2

docs/_pages/features.md docs/_pages/features.md +15 -8

docs/_posts/2020-09-09-ZeRO-Offload.md docs/_posts/2020-09-09-ZeRO-Offload.md +1 -1

docs/index.md docs/index.md +2 -2

No files found.
--- a/README.md
+++ b/README.md
--- a/docs/_pages/features.md
+++ b/docs/_pages/features.md
@@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \
 ```
 The script `<client_entry.py>` will execute on the resources specified in `<hostfile>`.
+## Pipeline Parallelism
+DeepSpeed provides [pipeline parallelism](/tutorials/pipeline/) for memory-
+and communication- efficient training. DeepSpeed supports a hybrid
+combination of data, model, and pipeline parallelism and has scaled to over
+[one trillion parameters using 3D parallelism]({{ site.press_release_v3 }}).
+Pipeline parallelism can also improve communication efficiency and has
+accelerated training by up to 7x on low-banwdith clusters.
-## Model Parallelism
+## Model Parallelism
 ### Support for Custom Model Parallelism
-DeepSpeed supports all forms of model parallelism including tensor slicing based
+DeepSpeed supports all forms of model parallelism including tensor slicing
-approaches such as the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), or
+based approaches such as the
-pipelined parallelism approaches such as
+[Megatron-LM](https://github.com/NVIDIA/Megatron-LM). It does so by only
-[PipeDream](https://github.com/msr-fiddle/pipedream) and
+requiring the model parallelism framework to provide a *model parallelism
-[GPipe](https://github.com/kakaobrain/torchgpipe). It does so by only requiring the model
+unit* (`mpu`) that implements a few bookkeeping functionalities:
-parallelism framework to provide a *model parallelism unit* (`mpu`) that implements a few
-bookkeeping functionalities:
 ```python
 mpu.get_model_parallel_rank()
@@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron
 Please see the [Megatron-LM tutorial](/tutorials/megatron/) for details.
 ## The Zero Redundancy Optimizer
 The Zero Redundancy Optimizer ([ZeRO](https://arxiv.org/abs/1910.02054)) is at
 the heart of DeepSpeed and enables large model training at a scale that is

--- a/docs/_posts/2020-09-09-ZeRO-Offload.md
+++ b/docs/_posts/2020-09-09-ZeRO-Offload.md
--- a/docs/index.md
+++ b/docs/index.md
@@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
 # What's New?
 * [2020/09/10] [DeepSpeed: Extreme-scale model training for everyone]({{ site.press_release_v3 }})
  * [Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html)
-  * [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/09/pipeline-parallelism.html)
+  * [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/08/pipeline-parallelism.html)
  * [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
  * [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)