Unverified Commit 6bb5c69f authored by Shaden Smith's avatar Shaden Smith Committed by GitHub
Browse files

Website edits (#398)


Co-authored-by: default avatarJeff Rasley <jerasley@microsoft.com>
parent 7baf3c3a
...@@ -37,7 +37,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale) ...@@ -37,7 +37,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
* [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html) * [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
* [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html) * [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)
* [2020/08/07] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) is now available on-demand * [2020/08/07] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) is now available on-demand
* [2020/07/24] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) on August 6th, 2020 * [2020/07/24] [DeepSpeed Microsoft Research Webinar](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html) on August 6th, 2020
[![DeepSpeed webinar](docs/assets/images/webinar-aug2020.png)](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-Live.html) [![DeepSpeed webinar](docs/assets/images/webinar-aug2020.png)](https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-Live.html)
* [2020/05/19] [ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) * [2020/05/19] [ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/)
* [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/18/zero-stage2.html) * [2020/05/19] [An Order-of-Magnitude Larger and Faster Training with ZeRO-2](https://www.deepspeed.ai/news/2020/05/18/zero-stage2.html)
...@@ -88,7 +88,7 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage. ...@@ -88,7 +88,7 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage.
* Support 10B model training on a single GPU * Support 10B model training on a single GPU
* [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html) * [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html)
* [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html) * [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html)
* Memory- and compute-efficient sparse kernels * Memory- and compute-efficient sparse kernels
* Support 10x long sequences than dense * Support 10x long sequences than dense
* Flexible support to different sparse structures * Flexible support to different sparse structures
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html) * [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html)
......
...@@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \ ...@@ -30,17 +30,22 @@ deepspeed --hostfile=<hostfile> \
``` ```
The script `<client_entry.py>` will execute on the resources specified in `<hostfile>`. The script `<client_entry.py>` will execute on the resources specified in `<hostfile>`.
## Pipeline Parallelism
DeepSpeed provides [pipeline parallelism](/tutorials/pipeline/) for memory-
and communication- efficient training. DeepSpeed supports a hybrid
combination of data, model, and pipeline parallelism and has scaled to over
[one trillion parameters using 3D parallelism]({{ site.press_release_v3 }}).
Pipeline parallelism can also improve communication efficiency and has
accelerated training by up to 7x on low-banwdith clusters.
## Model Parallelism
## Model Parallelism
### Support for Custom Model Parallelism ### Support for Custom Model Parallelism
DeepSpeed supports all forms of model parallelism including tensor slicing based DeepSpeed supports all forms of model parallelism including tensor slicing
approaches such as the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), or based approaches such as the
pipelined parallelism approaches such as [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). It does so by only
[PipeDream](https://github.com/msr-fiddle/pipedream) and requiring the model parallelism framework to provide a *model parallelism
[GPipe](https://github.com/kakaobrain/torchgpipe). It does so by only requiring the model unit* (`mpu`) that implements a few bookkeeping functionalities:
parallelism framework to provide a *model parallelism unit* (`mpu`) that implements a few
bookkeeping functionalities:
```python ```python
mpu.get_model_parallel_rank() mpu.get_model_parallel_rank()
...@@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron ...@@ -57,6 +62,8 @@ DeepSpeed is fully compatible with [Megatron](https://github.com/NVIDIA/Megatron
Please see the [Megatron-LM tutorial](/tutorials/megatron/) for details. Please see the [Megatron-LM tutorial](/tutorials/megatron/) for details.
## The Zero Redundancy Optimizer ## The Zero Redundancy Optimizer
The Zero Redundancy Optimizer ([ZeRO](https://arxiv.org/abs/1910.02054)) is at The Zero Redundancy Optimizer ([ZeRO](https://arxiv.org/abs/1910.02054)) is at
the heart of DeepSpeed and enables large model training at a scale that is the heart of DeepSpeed and enables large model training at a scale that is
......
...@@ -7,7 +7,7 @@ new_post: true ...@@ -7,7 +7,7 @@ new_post: true
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
--- ---
We introduce a new technology called ZeRO-Offload to enable **10X bigger model training on a single GPU**. ZeRO-Offload extends ZeRO-2 to leverage both CPU and GPU memory for training large models. Using a machine with **a single GPU**, our users now can run **models of up to 13 billion parameters** without running out of memory, 10x bigger than the existing approaches, while obtaining competitive throughput. This feature democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models. We introduce a new technology called ZeRO-Offload to enable **10X bigger model training on a single GPU**. ZeRO-Offload extends ZeRO-2 to leverage both CPU and GPU memory for training large models. Using a machine with **a single GPU**, our users now can run **models of up to 13 billion parameters** without running out of memory, 10x bigger than the existing approaches, while obtaining competitive throughput. This feature democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models.
* For more information on ZeRO-Offload, see our [press release]( {{ site.press_release_v3 }} ). * For more information on ZeRO-Offload, see our [press release]( {{ site.press_release_v3 }} ).
* For more information on how to use ZeRO-Offload, see our [ZeRO-Offload tutorial](https://www.deepspeed.ai/tutorials/zero-offload/). * For more information on how to use ZeRO-Offload, see our [ZeRO-Offload tutorial](https://www.deepspeed.ai/tutorials/zero-offload/).
......
...@@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale) ...@@ -30,7 +30,7 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
# What's New? # What's New?
* [2020/09/10] [DeepSpeed: Extreme-scale model training for everyone]({{ site.press_release_v3 }}) * [2020/09/10] [DeepSpeed: Extreme-scale model training for everyone]({{ site.press_release_v3 }})
* [Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html) * [Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html)
* [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/09/pipeline-parallelism.html) * [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/08/pipeline-parallelism.html)
* [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html) * [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
* [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html) * [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)
...@@ -168,7 +168,7 @@ Below we provide a brief feature list, see our detailed [feature overview](https ...@@ -168,7 +168,7 @@ Below we provide a brief feature list, see our detailed [feature overview](https
* Support 10B model training on a single GPU * Support 10B model training on a single GPU
* [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html) * [Ultra-fast dense transformer kernels](https://www.deepspeed.ai/news/2020/05/18/bert-record.html)
* [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html) * [Sparse attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention.html)
* Memory- and compute-efficient sparse kernels * Memory- and compute-efficient sparse kernels
* Support 10x long sequences than dense * Support 10x long sequences than dense
* Flexible support to different sparse structures * Flexible support to different sparse structures
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html) * [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment