Unverified Commit c28a71f9 authored by Olatunji Ruwase's avatar Olatunji Ruwase Committed by GitHub
Browse files

Minor doc tweaks (#761)

* Fix docstring

* Make screenshots clickable for easier viewing

* Navigation menu in alphabetical order; More clicable screenshots

* Rename 1Cycle doc

* Tweak naming
parent 7cab55c7
...@@ -38,7 +38,7 @@ collections: ...@@ -38,7 +38,7 @@ collections:
- bert-finetuning.md - bert-finetuning.md
- transformer_kernel.md - transformer_kernel.md
- megatron.md - megatron.md
- 1Cycle.md - one-cycle.md
- lrrt.md - lrrt.md
- zero.md - zero.md
- flops-profiler.md - flops-profiler.md
......
...@@ -58,35 +58,35 @@ lnav: ...@@ -58,35 +58,35 @@ lnav:
url: /getting-started/ url: /getting-started/
- title: "Getting started on Azure" - title: "Getting started on Azure"
url: /tutorials/azure/ url: /tutorials/azure/
- title: "BingBertSQuAD Fine-tuning"
url: /tutorials/bert-finetuning/
- title: "BERT Pre-training"
url: /tutorials/bert-pretraining/
- title: "CIFAR-10" - title: "CIFAR-10"
url: /tutorials/cifar-10/ url: /tutorials/cifar-10/
- title: "Flops Profiler"
url: /tutorials/flops-profiler/
- title: "GAN" - title: "GAN"
url: /tutorials/gan/ url: /tutorials/gan/
- title: "BERT Pre-training"
url: /tutorials/bert-pretraining/
- title: "BingBertSQuAD Fine-tuning"
url: /tutorials/bert-finetuning/
- title: "DeepSpeed Transformer Kernel"
url: /tutorials/transformer_kernel/
- title: "Megatron-LM GPT2"
url: /tutorials/megatron/
- title: "1-Cycle Schedule"
url: /tutorials/1Cycle/
- title: "Learning Rate Range Test" - title: "Learning Rate Range Test"
url: /tutorials/lrrt/ url: /tutorials/lrrt/
- title: "DeepSpeed Sparse Attention" - title: "Megatron-LM GPT2"
url: /tutorials/sparse-attention/ url: /tutorials/megatron/
- title: "ZeRO-Offload" - title: "One-Cycle Schedule"
url: /tutorials/zero-offload/ url: /tutorials/one-cycle/
- title: "ZeRO Redundancy Optimizer (ZeRO)" - title: "One-Bit Adam"
url: /tutorials/zero/
- title: "DeepSpeed with 1-bit Adam"
url: /tutorials/onebit-adam/ url: /tutorials/onebit-adam/
- title: "Pipeline Parallelism" - title: "Pipeline Parallelism"
url: /tutorials/pipeline/ url: /tutorials/pipeline/
- title: "Progressive Layer Dropping" - title: "Progressive Layer Dropping"
url: /tutorials/progressive_layer_dropping/ url: /tutorials/progressive_layer_dropping/
- title: "Flops Profiler" - title: "Sparse Attention"
url: /tutorials/flops-profiler/ url: /tutorials/sparse-attention/
- title: "Transformer Kernel"
url: /tutorials/transformer_kernel/
- title: "ZeRO-Offload"
url: /tutorials/zero-offload/
- title: "ZeRO Redundancy Optimizer (ZeRO)"
url: /tutorials/zero/
- title: "Contributing" - title: "Contributing"
url: /contributing/ url: /contributing/
...@@ -49,19 +49,26 @@ ZeRO-Offload leverages much for ZeRO stage 2 mechanisms, and so the configuratio ...@@ -49,19 +49,26 @@ ZeRO-Offload leverages much for ZeRO stage 2 mechanisms, and so the configuratio
} }
``` ```
As seen above, in addition to setting the _stage_ field to **2** (to enable ZeRO stage 2), we also need to set _cpu_offload_ flag to **true** enable ZeRO-Offload optimizations. In addition, we can set other ZeRO stage 2 optimization flags, such as _overlap_comm_ to tune ZeRO-Offload performance. With these changes we can now run the model. We share some screenshots of the training below. As seen above, in addition to setting the _stage_ field to **2** (to enable ZeRO stage 2), we also need to set _cpu_offload_ flag to **true** to enable ZeRO-Offload optimizations. In addition, we can set other ZeRO stage 2 optimization flags, such as _overlap_comm_ to tune ZeRO-Offload performance. With these changes we can now run the model. We share some screenshots of the training below.
Here is a screenshot of the training log: Here is a screenshot of the training log:
![ZERO_OFFLOAD_DP1_10B_LOG](/assets/images/zero_offload_dp1_10B_log.png) <a href="/assets/images/zero_offload_dp1_10B_log.png">
<img src="/assets/images/zero_offload_dp1_10B_log.png">
</a>
Here is a screenshot of nvidia-smi showing that only GPU 0 is active during training: Here is a screenshot of nvidia-smi showing that only GPU 0 is active during training:
![ZERO_OFFLOAD_DP1_10B_SMI](/assets/images/zero_offload_dp1_10B_smi.png) <a href="/assets/images/zero_offload_dp1_10B_smi.png">
<img src="/assets/images/zero_offload_dp1_10B_smi.png">
</a>
Finally, here is a screenshot of htop showing host CPU and memory activity during optimizer computation: Finally, here is a screenshot of htop showing host CPU and memory activity during optimizer computation:
![ZERO_OFFLOAD_DP1_10B_SMI](/assets/images/zero_offload_dp1_10B_cpu.png) <a href="/assets/images/zero_offload_dp1_10B_cpu.png">
<img src="/assets/images/zero_offload_dp1_10B_cpu.png">
</a>
Congratulations! You have completed the ZeRO-Offload tutorial. Congratulations! You have completed the ZeRO-Offload tutorial.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment