FairScale is a PyTorch extension library for high performance and large scale training on one or multiple machines/nodes. This library extends basic PyTorch capabilities while adding new experimental ones.
FairScale is a PyTorch extension library for high performance and large scale training.
This library extends basic PyTorch capabilities while adding new SOTA scaling techniques.
FairScale supports:
FairScale makes available the latest distributed training techniques in the form of composable
* Parallelism:
modules and easy to use APIs. These APIs are a fundamental part of a researcher's toolbox as
* Pipeline parallelism (`fairscale.nn.pipe`)
they attempt to scale models with limited resources.
* Sharded grad scaler - automatic mixed precision (`fairscale.optim.grad_scaler`)
## Requirements
* PyTorch >= 1.5.1
## Installation
FairScale was designed with the following values in mind:
Normal installation:
***Usability** - Users should be able to understand and use FairScale APIs with minimum cognitive overload.
```bash
pip install fairscale
***Modularity** - Users should be able to combine multiple FairScale APIs as part of their training loop seamlessly.
```
***Performance** - FairScale APIs provide the best performance in terms of scaling and efficiency.
Development mode:
```bash
cd fairscale
pip install-r requirements.txt
pip install-e .
```
If either of the above fails, add `--no-build-isolation` to the `pip install` command (this could be a problem with recent versions of pip).
## Installation
To install FairScale, please see the following [instructions](https://github.com/facebookresearch/fairscale/blob/master/docs/source/installation_instructions.rst). You should be able to install a pip package or
build directly from source.
## Getting Started
## Getting Started
The full documentation(https://fairscale.readthedocs.io/) contains instructions for getting started and extending fairscale.
The full [documentation](https://fairscale.readthedocs.io/) contains instructions for getting started, deep dives and tutorials about the various FairScale APIs.
## Examples
## Examples
Here are a few sample snippets from a subset of FairScale offerings:
### Pipe
### Pipe
Run a 4-layer model on 2 GPUs. The first two layers run on cuda:0 and the next two layers run on cuda:1.
Run a 4-layer model on 2 GPUs. The first two layers run on cuda:0 and the next two layers run on cuda:1.
...
@@ -163,17 +145,17 @@ At a high level, we want ML researchers to:
...
@@ -163,17 +145,17 @@ At a high level, we want ML researchers to:
We use circleci to test on PyTorch versions 1.6.0, 1.7.1, and 1.8.1. Please create an [issue](https://github.com/facebookresearch/fairscale/issues) if you are having trouble with installation.
We use circleci to test on PyTorch versions 1.6.0, 1.7.1, and 1.8.1. Please create an [issue](https://github.com/facebookresearch/fairscale/issues) if you are having trouble with installation.
## Contributors
## Contributors
See the [CONTRIBUTING](CONTRIBUTING.md) file for how to help out.
We welcome outside contributions! Please see the [CONTRIBUTING](CONTRIBUTING.md) instructions for how you can contribute to FairScale.
## License
## License
fairscale is licensed under the [BSD-3-Clause License](LICENSE).
FairScale is licensed under the [BSD-3-Clause License](LICENSE).
fairscale.nn.pipe is forked from [torchgpipe](https://github.com/kakaobrain/torchgpipe), Copyright 2019, Kakao Brain, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
fairscale.nn.pipe is forked from [torchgpipe](https://github.com/kakaobrain/torchgpipe), Copyright 2019, Kakao Brain, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
...
@@ -183,15 +165,16 @@ fairscale.optim.adascale is forked from [AdaptDL](https://github.com/petuum/adap
...
@@ -183,15 +165,16 @@ fairscale.optim.adascale is forked from [AdaptDL](https://github.com/petuum/adap
fairscale.nn.misc.flatten_params_wrapper is forked from [PyTorch-Reparam-Module](https://github.com/SsnL/PyTorch-Reparam-Module), Copyright 2018, Tongzhou Wang, licensed under [MIT License](https://github.com/SsnL/PyTorch-Reparam-Module/blob/master/LICENSE).
fairscale.nn.misc.flatten_params_wrapper is forked from [PyTorch-Reparam-Module](https://github.com/SsnL/PyTorch-Reparam-Module), Copyright 2018, Tongzhou Wang, licensed under [MIT License](https://github.com/SsnL/PyTorch-Reparam-Module/blob/master/LICENSE).
## References
Here is a list of all authors on relevant research papers this work is based on:
## Citing FairScale
* torchgpipe: Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, Sungwoong Kim. [[Paper](https://arxiv.org/pdf/2004.09910.pdf)] [[Code](https://github.com/kakaobrain/torchgpipe)]
If you use FairScale in your publication, please cite it by using the following BibTeX entry.
* Megatron-LM: Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. [[Paper](https://arxiv.org/pdf/1909.08053.pdf)][[Code](https://github.com/NVIDIA/Megatron-LM)]
```BibTeX
* AdaScale SGD: Tyler B. Johnson, Pulkit Agrawal, Haijie Gu, Carlos Guestrin. [[Paper](https://proceedings.icml.cc/static/paper_files/icml/2020/4682-Paper.pdf)]
author = {Mandeep Baines, Shruti Bhosale, Vittorio Caggiano, Naman Goyal, Siddharth Goyal, Myle Ott, Benjamin Lefaudeux, Vitaliy Liptchinsky, Mike Rabatt, Sam Sheiffer, Anjali Sridhar, Min Xu},
* AMPNet:Alexander L. Gaunt, Matthew A. Johnson, Maik Riechert, Daniel Tarlow, Ryota Tomioka, Dimitrios Vytiniotis, Sam Webster [[Paper]](https://arxiv.org/abs/1705.09786)
title = {FairScale: A general purpose modular PyTorch library for high performance and large scale training},
* L2L: Training large Neural networks with constant Memory using a new execution Algorithm, 2020, [[Paper](https://arxiv.org/abs/2002.05645)]