Changing README headers and testing discussion (#49)

* Increasing section headers * Move testing under contributing Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Changing README headers and testing discussion (#49)
* Increasing section headers * Move testing under contributing Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
b068e701 · Shaden Smith · GitHub · 46686c9a · b068e701
Commit b068e701 authored Feb 09, 2020 by Shaden Smith Committed by GitHub Feb 09, 2020
Show whitespace changes
Inline Side-by-side

Showing with 46 additions and 47 deletions

README.md README.md +46 -47

No files found.
--- a/README.md
+++ b/README.md
@@ -12,19 +12,18 @@ DeepSpeed can train DL models with over a hundred billion parameters on current
 generation of GPU clusters, while achieving over 5x in system performance
 compared to the state-of-art.
-## Table of Contents
+# Table of Contents
 | Section                                 | Description                                 |
 | --------------------------------------- | ------------------------------------------- |
 | [Why DeepSpeed?](#why-deepspeed)        |  DeepSpeed overview                         |
 | [Getting Started](#getting-started)     |  DeepSpeed first steps                      |
 | [Further Reading](#further-reading)     |  DeepSpeed features, tutorials, etc.        |
-| [Testing](#testing)                     |  Instructions for testing DeepSpeed         |
 | [Contributing](#contributing)           |  Instructions for contributing to DeepSpeed |
-## Why DeepSpeed?
+# Why DeepSpeed?
 Training advanced deep learning models is challenging. Beyond model design,
 model scientists also need to set up the state-of-the-art training techniques
 such as distributed training, mixed precision, gradient accumulation, and
@@ -34,7 +33,7 @@ a large model easily runs out of memory with pure data paralelism and it is
 difficult to use model parallelism. DeepSpeed addresses these challenges to
 accelerate model development *and* training.
-### Distributed, Effective, and Efficient Training with Ease
+## Distributed, Effective, and Efficient Training with Ease
 The DeepSpeed API is a lightweight wrapper on [PyTorch](https://pytorch.org/). This
 means that you can use everything you love in PyTorch and without learning a new
 platform. In addition, DeepSpeed manages all of the boilerplate state-of-the-art
@@ -44,7 +43,7 @@ importantly, you can leverage the distinctive efficiency and effectiveness benef
 DeepSpeed to boost speed and scale with just a few lines of code changes to your PyTorch
 models.
-### Speed
+## Speed
 DeepSpeed achieves high performance and fast convergence through a combination of
 efficiency optimizations on compute/communication/memory/IO and effectiveness
 optimizations on advanced hyperparameter tuning and optimizers. For example:
@@ -72,7 +71,7 @@ optimizations on advanced hyperparameter tuning and optimizers. For example:
-### Memory efficiency
+## Memory efficiency
 DeepSpeed provides memory-efficient data parallelism and enables training models without
 model parallelism. For example, DeepSpeed can train models with up to 6 billion parameters on
 NVIDIA V100 GPUs with 32GB of device memory. In comparison, existing frameworks (e.g.,
@@ -84,7 +83,7 @@ replicated across data-parallel processes, ZeRO partitions model states to save
 significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
 4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054).
-### Scalability
+## Scalability
 DeepSpeed supports efficient data parallelism, model parallelism, and their
 combination. ZeRO boosts the scaling capability and efficiency further.
 * DeepSpeed provides system support to run models up to 100 billion parameters,
@@ -105,7 +104,7 @@ combination. ZeRO boosts the scaling capability and efficiency further.
 </p>
-### Fast convergence for effectiveness
+## Fast convergence for effectiveness
 DeepSpeed supports advanced hyperparameter tuning and large batch size
 optimizers such as [LAMB](https://arxiv.org/abs/1904.00962). These improve the
 effectiveness of model training and reduce the number of samples required to
@@ -119,11 +118,11 @@ convergence to desired accuracy.
 [QANet tutorial](../../Tutorials/QANet/QANetTutorial.md)
 -->
-### Good Usability
+## Good Usability
 Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custome model parallelisms, such as tensor slicing of Nvidia Megatron-LM.  
-### Features
+## Features
 Below we provide a brief feature list, see our detailed [feature
 overview](./docs/features.md) for descriptions and usage.
@@ -155,16 +154,16 @@ overview](./docs/features.md) for descriptions and usage.
 * [Performance Analysis and Debugging](./docs/features.md#performance-analysis-and-debugging)
-## Getting Started
+# Getting Started
-### Installation
+## Installation
 * Please see our [Azure tutorial](docs/azure.md) to get started with DeepSpeed on Azure!
-* If you're not on Azure we recommend using our docker image via `docker pull deepspeed/deepspeed:latest` which contains a pre-installed version of DeepSpeed and all the necessary dependencies.
+* If you're not on Azure, we recommend using our docker image via `docker pull deepspeed/deepspeed:latest` which contains a pre-installed version of DeepSpeed and all the necessary dependencies.
-* If you want to install DeepSpeed manually we provide an install script [install.sh](install.sh) to help install on a local machine or across an entire cluster.
+* If you want to install DeepSpeed manually, we provide an install script [install.sh](install.sh) to help install on a local machine or across an entire cluster.
-### Writing DeepSpeed Models
+## Writing DeepSpeed Models
 DeepSpeed model training is accomplished using the DeepSpeed engine. The engine
 can wrap any arbitrary model of type `torch.nn.module` and has a minimal set of APIs
 for training and checkpointing the model. Please see the tutorials for detailed
@@ -185,7 +184,7 @@ scheduler based on the parameters passed to `deepspeed.initialze` and the
 DeepSpeed [configuration file](#deepspeed-configuration).
-#### Training
+### Training
 Once the DeepSpeed engine has been initialized, it can be used to train the
 model using three simple APIs for forward propagation (`()`), backward
@@ -222,7 +221,7 @@ pre-defined learning rate schedule:
-#### Model Checkpointing
+### Model Checkpointing
 Saving and loading the training state is handled via the `save_checkpoint` and
 `load_checkpoint` API in DeepSpeed which takes two arguments to uniquely
 identify a checkpoint:
@@ -265,7 +264,7 @@ retrieved from `load_checkpoint` as a return argument. In the example above,
 the `step` value is stored as part of the `client_sd`.
-### DeepSpeed Configuration
+## DeepSpeed Configuration
 DeepSpeed featureds can be enabled, disabled, or configured using a config JSON
 file that should be specified as `args.deepspeed_config`. A sample config file
 is shown below. For a full set of features see [core API
@@ -296,7 +295,7 @@ doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html).
 }
 ```
-## Launching DeepSpeed Training
+# Launching DeepSpeed Training
 DeepSpeed installs the entry point `deepspeed` to launch distributed training.
 We illustrate an example usage of DeepSpeed with the following assumptions:
@@ -306,7 +305,7 @@ We illustrate an example usage of DeepSpeed with the following assumptions:
 4. `ds_config.json` is the configuration file for DeepSpeed
-### Resource Configuration (multi-node)
+## Resource Configuration (multi-node)
 DeepSpeed configures multi-node compute resources with hostfiles that are compatible with
 [OpenMPI](https://www.open-mpi.org/) and [Horovod](https://github.com/horovod/horovod).
 A hostfile is a list of *hostnames* (or SSH aliases), which are machines accessible via passwordless
@@ -356,7 +355,7 @@ deepspeed --include="worker-2:0,1" \
 	--deepspeed --deepspeed_config ds_config.json
 ```
-### Resource Configuration (single-node)
+## Resource Configuration (single-node)
 In the case that we are only running on a single node (with one or more GPUs)
 DeepSpeed *does not* require a hostfile as described above. If a hostfile is
 not detected or passed in then DeepSpeed will query the number of GPUs on the
@@ -365,7 +364,7 @@ local machine to discover the number of slots available. The `--include` and
 as the hostname.
-## Further Reading
+# Further Reading
 | Article                                                                                        | Description                                  |
 | ---------------------------------------------------------------------------------------------- | -------------------------------------------- |
@@ -377,8 +376,30 @@ as the hostname.
-## Testing
+# Contributing
+DeepSpeed welcomes your contributions!
+## Prerequisites
+DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
+consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
+installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
+installed once before commits can be made:
+```bash
+pre-commit install
+```
+Afterwards, our suite of formatting tests run automatically before each `git commit`. You
+can also run these manually:
+```bash
+pre-commit run --all-files
+```
+If a formatting test fails, it will fix the modified code in place and abort
+the `git commit`. After looking over the changes, you can `git add <modified files>`
+and then repeat the previous `git commit` command.
+## Testing
 DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
 The model convergence tests train
 [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
@@ -407,29 +428,7 @@ pytest run_sanity_check.py
 Note that the `--forked` flag is not necessary for the model tests.
-## Contributing
+## Contributor License Agreement
-DeepSpeed welcomes your contributions!
-### Prerequisites
-DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
-consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
-installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
-installed once before commits can be made:
-```bash
-pre-commit install
-```
-Afterwards, our suite of formatting tests run automatically before each `git commit`. You
-can also run these manually:
-```bash
-pre-commit run --all-files
-```
-If a formatting test fails, it will fix the modified code in place and abort
-the `git commit`. After looking over the changes, you can `git add <modified files>`
-and then repeat the previous `git commit` command.
-### Contributor License Agreement
 This project welcomes contributions and suggestions. Most contributions require you to
 agree to a Contributor License Agreement (CLA) declaring that you have the right to, and
 actually do, grant us the rights to use your contribution. For details, visit
@@ -440,7 +439,7 @@ to provide a CLA and decorate the PR appropriately (e.g., status check, comment)
 follow the instructions provided by the bot. You will only need to do this once across
 all repos using our CLA.
-### Code of Conduct
+## Code of Conduct
 This project has adopted the [Microsoft Open Source Code of
 Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the
 [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact