"examples/vscode:/vscode.git/clone" did not exist on "20e1bb455b0304d6d39125b9d4f528fe17947c27"
Commit b068e701 authored by Shaden Smith's avatar Shaden Smith Committed by GitHub
Browse files

Changing README headers and testing discussion (#49)



* Increasing section headers

* Move testing under contributing
Co-authored-by: default avatarJeff Rasley <jerasley@microsoft.com>
parent 46686c9a
...@@ -12,19 +12,18 @@ DeepSpeed can train DL models with over a hundred billion parameters on current ...@@ -12,19 +12,18 @@ DeepSpeed can train DL models with over a hundred billion parameters on current
generation of GPU clusters, while achieving over 5x in system performance generation of GPU clusters, while achieving over 5x in system performance
compared to the state-of-art. compared to the state-of-art.
## Table of Contents # Table of Contents
| Section | Description | | Section | Description |
| --------------------------------------- | ------------------------------------------- | | --------------------------------------- | ------------------------------------------- |
| [Why DeepSpeed?](#why-deepspeed) | DeepSpeed overview | | [Why DeepSpeed?](#why-deepspeed) | DeepSpeed overview |
| [Getting Started](#getting-started) | DeepSpeed first steps | | [Getting Started](#getting-started) | DeepSpeed first steps |
| [Further Reading](#further-reading) | DeepSpeed features, tutorials, etc. | | [Further Reading](#further-reading) | DeepSpeed features, tutorials, etc. |
| [Testing](#testing) | Instructions for testing DeepSpeed |
| [Contributing](#contributing) | Instructions for contributing to DeepSpeed | | [Contributing](#contributing) | Instructions for contributing to DeepSpeed |
## Why DeepSpeed? # Why DeepSpeed?
Training advanced deep learning models is challenging. Beyond model design, Training advanced deep learning models is challenging. Beyond model design,
model scientists also need to set up the state-of-the-art training techniques model scientists also need to set up the state-of-the-art training techniques
such as distributed training, mixed precision, gradient accumulation, and such as distributed training, mixed precision, gradient accumulation, and
...@@ -34,7 +33,7 @@ a large model easily runs out of memory with pure data paralelism and it is ...@@ -34,7 +33,7 @@ a large model easily runs out of memory with pure data paralelism and it is
difficult to use model parallelism. DeepSpeed addresses these challenges to difficult to use model parallelism. DeepSpeed addresses these challenges to
accelerate model development *and* training. accelerate model development *and* training.
### Distributed, Effective, and Efficient Training with Ease ## Distributed, Effective, and Efficient Training with Ease
The DeepSpeed API is a lightweight wrapper on [PyTorch](https://pytorch.org/). This The DeepSpeed API is a lightweight wrapper on [PyTorch](https://pytorch.org/). This
means that you can use everything you love in PyTorch and without learning a new means that you can use everything you love in PyTorch and without learning a new
platform. In addition, DeepSpeed manages all of the boilerplate state-of-the-art platform. In addition, DeepSpeed manages all of the boilerplate state-of-the-art
...@@ -44,7 +43,7 @@ importantly, you can leverage the distinctive efficiency and effectiveness benef ...@@ -44,7 +43,7 @@ importantly, you can leverage the distinctive efficiency and effectiveness benef
DeepSpeed to boost speed and scale with just a few lines of code changes to your PyTorch DeepSpeed to boost speed and scale with just a few lines of code changes to your PyTorch
models. models.
### Speed ## Speed
DeepSpeed achieves high performance and fast convergence through a combination of DeepSpeed achieves high performance and fast convergence through a combination of
efficiency optimizations on compute/communication/memory/IO and effectiveness efficiency optimizations on compute/communication/memory/IO and effectiveness
optimizations on advanced hyperparameter tuning and optimizers. For example: optimizations on advanced hyperparameter tuning and optimizers. For example:
...@@ -72,7 +71,7 @@ optimizations on advanced hyperparameter tuning and optimizers. For example: ...@@ -72,7 +71,7 @@ optimizations on advanced hyperparameter tuning and optimizers. For example:
### Memory efficiency ## Memory efficiency
DeepSpeed provides memory-efficient data parallelism and enables training models without DeepSpeed provides memory-efficient data parallelism and enables training models without
model parallelism. For example, DeepSpeed can train models with up to 6 billion parameters on model parallelism. For example, DeepSpeed can train models with up to 6 billion parameters on
NVIDIA V100 GPUs with 32GB of device memory. In comparison, existing frameworks (e.g., NVIDIA V100 GPUs with 32GB of device memory. In comparison, existing frameworks (e.g.,
...@@ -84,7 +83,7 @@ replicated across data-parallel processes, ZeRO partitions model states to save ...@@ -84,7 +83,7 @@ replicated across data-parallel processes, ZeRO partitions model states to save
significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to significant memory. The current implementation (stage 1 of ZeRO) reduces memory by up to
4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054). 4x relative to the state-of-art. You can read more about ZeRO in our [paper](https://arxiv.org/abs/1910.02054).
### Scalability ## Scalability
DeepSpeed supports efficient data parallelism, model parallelism, and their DeepSpeed supports efficient data parallelism, model parallelism, and their
combination. ZeRO boosts the scaling capability and efficiency further. combination. ZeRO boosts the scaling capability and efficiency further.
* DeepSpeed provides system support to run models up to 100 billion parameters, * DeepSpeed provides system support to run models up to 100 billion parameters,
...@@ -105,7 +104,7 @@ combination. ZeRO boosts the scaling capability and efficiency further. ...@@ -105,7 +104,7 @@ combination. ZeRO boosts the scaling capability and efficiency further.
</p> </p>
### Fast convergence for effectiveness ## Fast convergence for effectiveness
DeepSpeed supports advanced hyperparameter tuning and large batch size DeepSpeed supports advanced hyperparameter tuning and large batch size
optimizers such as [LAMB](https://arxiv.org/abs/1904.00962). These improve the optimizers such as [LAMB](https://arxiv.org/abs/1904.00962). These improve the
effectiveness of model training and reduce the number of samples required to effectiveness of model training and reduce the number of samples required to
...@@ -119,11 +118,11 @@ convergence to desired accuracy. ...@@ -119,11 +118,11 @@ convergence to desired accuracy.
[QANet tutorial](../../Tutorials/QANet/QANetTutorial.md) [QANet tutorial](../../Tutorials/QANet/QANetTutorial.md)
--> -->
### Good Usability ## Good Usability
Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custome model parallelisms, such as tensor slicing of Nvidia Megatron-LM. Only a few lines of code changes are needed to enable a PyTorch model to use DeepSpeed and ZeRO. Compared to current model parallelism libraries, DeepSpeed does not require a code redesign or model refactoring. It also does not put limitations on model dimensions (such as number of attention heads, hidden sizes, and others), batch size, or any other training parameters. For models of up to six billion parameters, you can use ZeRO-powered data parallelism conveniently without requiring model parallelism, while in contrast, standard data parallelism will run out of memory for models with more than 1.3 billion parameters. In addition, DeepSpeed conveniently supports flexible combination of ZeRO-powered data parallelism with custome model parallelisms, such as tensor slicing of Nvidia Megatron-LM.
### Features ## Features
Below we provide a brief feature list, see our detailed [feature Below we provide a brief feature list, see our detailed [feature
overview](./docs/features.md) for descriptions and usage. overview](./docs/features.md) for descriptions and usage.
...@@ -155,16 +154,16 @@ overview](./docs/features.md) for descriptions and usage. ...@@ -155,16 +154,16 @@ overview](./docs/features.md) for descriptions and usage.
* [Performance Analysis and Debugging](./docs/features.md#performance-analysis-and-debugging) * [Performance Analysis and Debugging](./docs/features.md#performance-analysis-and-debugging)
## Getting Started # Getting Started
### Installation ## Installation
* Please see our [Azure tutorial](docs/azure.md) to get started with DeepSpeed on Azure! * Please see our [Azure tutorial](docs/azure.md) to get started with DeepSpeed on Azure!
* If you're not on Azure we recommend using our docker image via `docker pull deepspeed/deepspeed:latest` which contains a pre-installed version of DeepSpeed and all the necessary dependencies. * If you're not on Azure, we recommend using our docker image via `docker pull deepspeed/deepspeed:latest` which contains a pre-installed version of DeepSpeed and all the necessary dependencies.
* If you want to install DeepSpeed manually we provide an install script [install.sh](install.sh) to help install on a local machine or across an entire cluster. * If you want to install DeepSpeed manually, we provide an install script [install.sh](install.sh) to help install on a local machine or across an entire cluster.
### Writing DeepSpeed Models ## Writing DeepSpeed Models
DeepSpeed model training is accomplished using the DeepSpeed engine. The engine DeepSpeed model training is accomplished using the DeepSpeed engine. The engine
can wrap any arbitrary model of type `torch.nn.module` and has a minimal set of APIs can wrap any arbitrary model of type `torch.nn.module` and has a minimal set of APIs
for training and checkpointing the model. Please see the tutorials for detailed for training and checkpointing the model. Please see the tutorials for detailed
...@@ -185,7 +184,7 @@ scheduler based on the parameters passed to `deepspeed.initialze` and the ...@@ -185,7 +184,7 @@ scheduler based on the parameters passed to `deepspeed.initialze` and the
DeepSpeed [configuration file](#deepspeed-configuration). DeepSpeed [configuration file](#deepspeed-configuration).
#### Training ### Training
Once the DeepSpeed engine has been initialized, it can be used to train the Once the DeepSpeed engine has been initialized, it can be used to train the
model using three simple APIs for forward propagation (`()`), backward model using three simple APIs for forward propagation (`()`), backward
...@@ -222,7 +221,7 @@ pre-defined learning rate schedule: ...@@ -222,7 +221,7 @@ pre-defined learning rate schedule:
#### Model Checkpointing ### Model Checkpointing
Saving and loading the training state is handled via the `save_checkpoint` and Saving and loading the training state is handled via the `save_checkpoint` and
`load_checkpoint` API in DeepSpeed which takes two arguments to uniquely `load_checkpoint` API in DeepSpeed which takes two arguments to uniquely
identify a checkpoint: identify a checkpoint:
...@@ -265,7 +264,7 @@ retrieved from `load_checkpoint` as a return argument. In the example above, ...@@ -265,7 +264,7 @@ retrieved from `load_checkpoint` as a return argument. In the example above,
the `step` value is stored as part of the `client_sd`. the `step` value is stored as part of the `client_sd`.
### DeepSpeed Configuration ## DeepSpeed Configuration
DeepSpeed featureds can be enabled, disabled, or configured using a config JSON DeepSpeed featureds can be enabled, disabled, or configured using a config JSON
file that should be specified as `args.deepspeed_config`. A sample config file file that should be specified as `args.deepspeed_config`. A sample config file
is shown below. For a full set of features see [core API is shown below. For a full set of features see [core API
...@@ -296,7 +295,7 @@ doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html). ...@@ -296,7 +295,7 @@ doc](https://microsoft.github.io/DeepSpeed/docs/htmlfiles/api/full/index.html).
} }
``` ```
## Launching DeepSpeed Training # Launching DeepSpeed Training
DeepSpeed installs the entry point `deepspeed` to launch distributed training. DeepSpeed installs the entry point `deepspeed` to launch distributed training.
We illustrate an example usage of DeepSpeed with the following assumptions: We illustrate an example usage of DeepSpeed with the following assumptions:
...@@ -306,7 +305,7 @@ We illustrate an example usage of DeepSpeed with the following assumptions: ...@@ -306,7 +305,7 @@ We illustrate an example usage of DeepSpeed with the following assumptions:
4. `ds_config.json` is the configuration file for DeepSpeed 4. `ds_config.json` is the configuration file for DeepSpeed
### Resource Configuration (multi-node) ## Resource Configuration (multi-node)
DeepSpeed configures multi-node compute resources with hostfiles that are compatible with DeepSpeed configures multi-node compute resources with hostfiles that are compatible with
[OpenMPI](https://www.open-mpi.org/) and [Horovod](https://github.com/horovod/horovod). [OpenMPI](https://www.open-mpi.org/) and [Horovod](https://github.com/horovod/horovod).
A hostfile is a list of *hostnames* (or SSH aliases), which are machines accessible via passwordless A hostfile is a list of *hostnames* (or SSH aliases), which are machines accessible via passwordless
...@@ -356,7 +355,7 @@ deepspeed --include="worker-2:0,1" \ ...@@ -356,7 +355,7 @@ deepspeed --include="worker-2:0,1" \
--deepspeed --deepspeed_config ds_config.json --deepspeed --deepspeed_config ds_config.json
``` ```
### Resource Configuration (single-node) ## Resource Configuration (single-node)
In the case that we are only running on a single node (with one or more GPUs) In the case that we are only running on a single node (with one or more GPUs)
DeepSpeed *does not* require a hostfile as described above. If a hostfile is DeepSpeed *does not* require a hostfile as described above. If a hostfile is
not detected or passed in then DeepSpeed will query the number of GPUs on the not detected or passed in then DeepSpeed will query the number of GPUs on the
...@@ -365,7 +364,7 @@ local machine to discover the number of slots available. The `--include` and ...@@ -365,7 +364,7 @@ local machine to discover the number of slots available. The `--include` and
as the hostname. as the hostname.
## Further Reading # Further Reading
| Article | Description | | Article | Description |
| ---------------------------------------------------------------------------------------------- | -------------------------------------------- | | ---------------------------------------------------------------------------------------------- | -------------------------------------------- |
...@@ -377,8 +376,30 @@ as the hostname. ...@@ -377,8 +376,30 @@ as the hostname.
## Testing # Contributing
DeepSpeed welcomes your contributions!
## Prerequisites
DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
installed once before commits can be made:
```bash
pre-commit install
```
Afterwards, our suite of formatting tests run automatically before each `git commit`. You
can also run these manually:
```bash
pre-commit run --all-files
```
If a formatting test fails, it will fix the modified code in place and abort
the `git commit`. After looking over the changes, you can `git add <modified files>`
and then repeat the previous `git commit` command.
## Testing
DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests. DeepSpeed tracks two types of tests: unit tests and more costly model convergence tests.
The model convergence tests train The model convergence tests train
[DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure [DeepSpeedExamples](https://github.com/microsoft/DeepSpeedExamples/) and measure
...@@ -407,29 +428,7 @@ pytest run_sanity_check.py ...@@ -407,29 +428,7 @@ pytest run_sanity_check.py
Note that the `--forked` flag is not necessary for the model tests. Note that the `--forked` flag is not necessary for the model tests.
## Contributing ## Contributor License Agreement
DeepSpeed welcomes your contributions!
### Prerequisites
DeepSpeed uses [pre-commit](https://pre-commit.com/) to ensure that formatting is
consistent across DeepSpeed. First, ensure that `pre-commit` is installed from either
installing DeepSpeed or `pip install pre-commit`. Next, the pre-commit hooks must be
installed once before commits can be made:
```bash
pre-commit install
```
Afterwards, our suite of formatting tests run automatically before each `git commit`. You
can also run these manually:
```bash
pre-commit run --all-files
```
If a formatting test fails, it will fix the modified code in place and abort
the `git commit`. After looking over the changes, you can `git add <modified files>`
and then repeat the previous `git commit` command.
### Contributor License Agreement
This project welcomes contributions and suggestions. Most contributions require you to This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to, and agree to a Contributor License Agreement (CLA) declaring that you have the right to, and
actually do, grant us the rights to use your contribution. For details, visit actually do, grant us the rights to use your contribution. For details, visit
...@@ -440,7 +439,7 @@ to provide a CLA and decorate the PR appropriately (e.g., status check, comment) ...@@ -440,7 +439,7 @@ to provide a CLA and decorate the PR appropriately (e.g., status check, comment)
follow the instructions provided by the bot. You will only need to do this once across follow the instructions provided by the bot. You will only need to do this once across
all repos using our CLA. all repos using our CLA.
### Code of Conduct ## Code of Conduct
This project has adopted the [Microsoft Open Source Code of This project has adopted the [Microsoft Open Source Code of
Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the
[Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment