Unverified Commit 43a27cd4 authored by Min Xu's avatar Min Xu Committed by GitHub
Browse files

[docs] clarify per-GPU batch size for AdaScale (#301)

- clarify that per-GPU batch size is not increased with AdaScale.
parent 2d954203
......@@ -120,6 +120,8 @@ AdaScale can be used to wrap a SGD optimizer and to be used in DDP (Distributed
training or non-DDP with gradient accumulation. The benefit is to re-use the same LR
schedule from a baseline batch size when effective batch size is bigger.
Note that AdaScale does _not_ help increase per-GPU batch size.
```python
from torch.optim import SGD
from torch.optim.lr_scheduler import LambdaLR # or your scheduler
......@@ -147,11 +149,12 @@ while not done:
```
Primary goal is to allow scaling to bigger batch sizes without losing model accuracy.
(However, training time might be longer comparing to without AdaScale.)
At a high level, we want ML researchers to:
* go parallel more easily (i.e. reuse the same LR schedule)
* go parallel more easily (i.e. no need to find new learning rate schedules)
* not worrying about lossing accuracy
* get same (or higher) GPU efficiency (fewer steps, less networking, etc.)
* potentially higher GPU efficiency (fewer steps, less networking overhead, etc.)
# Testing
......
......@@ -47,11 +47,14 @@ class AdaScale(Optimizer):
distributed and large batch size training. Can be used in combination with
``torch.nn.parallel.DistributedDataParallel`` and ``torch.optim.SGD``.
Subclass `Optimizer` so that `torch.optim.lr_scheduler` can work. In other words,
AdaScale is intended to be a complete wrapper of an torch Optimizer.
.. _AdaScale: https://proceedings.icml.cc/static/paper_files/icml/2020/4682-Supplemental.pdf
This class subclasses `Optimizer` so that `torch.optim.lr_scheduler` can
work with it. In other words, AdaScale is intended to be a complete wrapper of an
torch Optimizer.
Note that, AdaScale does _not_ help increase per-GPU batch size.
There are several ways to integrate AdaScale with your training loop.
We show two examples below.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment