Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
apex
Commits
17e8a552
Commit
17e8a552
authored
Aug 27, 2019
by
Michael Carilli
Browse files
Docstring updates
parent
ea7c2098
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
30 additions
and
22 deletions
+30
-22
apex/optimizers/fused_adam.py
apex/optimizers/fused_adam.py
+9
-7
apex/optimizers/fused_lamb.py
apex/optimizers/fused_lamb.py
+7
-5
apex/optimizers/fused_novograd.py
apex/optimizers/fused_novograd.py
+6
-4
apex/optimizers/fused_sgd.py
apex/optimizers/fused_sgd.py
+8
-6
No files found.
apex/optimizers/fused_adam.py
View file @
17e8a552
...
@@ -6,14 +6,15 @@ class FusedAdam(torch.optim.Optimizer):
...
@@ -6,14 +6,15 @@ class FusedAdam(torch.optim.Optimizer):
"""Implements Adam algorithm.
"""Implements Adam algorithm.
Currently GPU-only. Requires Apex to be installed via
Currently GPU-only. Requires Apex to be installed via
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
This version of fused Adam implements 2 fusions:
This version of fused Adam implements 2 fusions.
- Fusion of the Adam update's elementwise operations
- A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedAdam` may be used as a drop-in replacement for torch.optim.Adam::
* Fusion of the Adam update's elementwise operations
* A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedAdam` may be used as a drop-in replacement for ``torch.optim.Adam``::
opt = apex.optimizers.FusedAdam(model.parameters(), lr = ....)
opt = apex.optimizers.FusedAdam(model.parameters(), lr = ....)
...
...
...
@@ -21,16 +22,17 @@ class FusedAdam(torch.optim.Optimizer):
...
@@ -21,16 +22,17 @@ class FusedAdam(torch.optim.Optimizer):
:class:`apex.optimizers.FusedAdam` may be used with or without Amp. If you wish to use :class:`FusedAdam` with Amp,
:class:`apex.optimizers.FusedAdam` may be used with or without Amp. If you wish to use :class:`FusedAdam` with Amp,
you may choose any `opt_level`::
you may choose any `opt_level`::
opt = apex.optimizers.FusedAdam(model.parameters(), lr = ....)
opt = apex.optimizers.FusedAdam(model.parameters(), lr = ....)
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
...
...
opt.step()
opt.step()
In general, `opt_level="O1"` is recommended.
In general,
`
`opt_level="O1"`
`
is recommended.
.. warning::
.. warning::
A previous version of :class:`FusedAdam` allowed a number of additional arguments to `step`. These additional arguments
A previous version of :class:`FusedAdam` allowed a number of additional arguments to
`
`step`
`
. These additional arguments
are now deprecated and unnecessary.
are now deprecated and unnecessary.
Adam was been proposed in `Adam: A Method for Stochastic Optimization`_.
Adam was been proposed in `Adam: A Method for Stochastic Optimization`_.
...
...
apex/optimizers/fused_lamb.py
View file @
17e8a552
...
@@ -8,9 +8,10 @@ class FusedLAMB(torch.optim.Optimizer):
...
@@ -8,9 +8,10 @@ class FusedLAMB(torch.optim.Optimizer):
Currently GPU-only. Requires Apex to be installed via
Currently GPU-only. Requires Apex to be installed via
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
This version of fused LAMB implements 2 fusions:
This version of fused LAMB implements 2 fusions.
- Fusion of the LAMB update's elementwise operations
- A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
* Fusion of the LAMB update's elementwise operations
* A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedLAMB`'s usage is identical to any ordinary Pytorch optimizer::
:class:`apex.optimizers.FusedLAMB`'s usage is identical to any ordinary Pytorch optimizer::
...
@@ -20,12 +21,13 @@ class FusedLAMB(torch.optim.Optimizer):
...
@@ -20,12 +21,13 @@ class FusedLAMB(torch.optim.Optimizer):
:class:`apex.optimizers.FusedLAMB` may be used with or without Amp. If you wish to use :class:`FusedLAMB` with Amp,
:class:`apex.optimizers.FusedLAMB` may be used with or without Amp. If you wish to use :class:`FusedLAMB` with Amp,
you may choose any `opt_level`::
you may choose any `opt_level`::
opt = apex.optimizers.FusedLAMB(model.parameters(), lr = ....)
opt = apex.optimizers.FusedLAMB(model.parameters(), lr = ....)
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
...
...
opt.step()
opt.step()
In general, `opt_level="O1"` is recommended.
In general,
`
`opt_level="O1"`
`
is recommended.
LAMB was proposed in `Large Batch Optimization for Deep Learning: Training BERT in 76 minutes`_.
LAMB was proposed in `Large Batch Optimization for Deep Learning: Training BERT in 76 minutes`_.
...
@@ -50,7 +52,7 @@ class FusedLAMB(torch.optim.Optimizer):
...
@@ -50,7 +52,7 @@ class FusedLAMB(torch.optim.Optimizer):
max_grad_norm (float, optional): value used to clip global grad norm
max_grad_norm (float, optional): value used to clip global grad norm
(default: 1.0)
(default: 1.0)
.. _Large Batch Optimization for Deep Learning\: Training BERT in 76 minutes
.. _Large Batch Optimization for Deep Learning\: Training BERT in 76 minutes
:
https://arxiv.org/abs/1904.00962
https://arxiv.org/abs/1904.00962
.. _On the Convergence of Adam and Beyond:
.. _On the Convergence of Adam and Beyond:
https://openreview.net/forum?id=ryQu7f-RZ
https://openreview.net/forum?id=ryQu7f-RZ
...
...
apex/optimizers/fused_novograd.py
View file @
17e8a552
...
@@ -8,9 +8,10 @@ class FusedNovoGrad(torch.optim.Optimizer):
...
@@ -8,9 +8,10 @@ class FusedNovoGrad(torch.optim.Optimizer):
Currently GPU-only. Requires Apex to be installed via
Currently GPU-only. Requires Apex to be installed via
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
This version of fused NovoGrad implements 2 fusions:
This version of fused NovoGrad implements 2 fusions.
- Fusion of the NovoGrad update's elementwise operations
- A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
* Fusion of the NovoGrad update's elementwise operations
* A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedNovoGrad`'s usage is identical to any Pytorch optimizer::
:class:`apex.optimizers.FusedNovoGrad`'s usage is identical to any Pytorch optimizer::
...
@@ -20,12 +21,13 @@ class FusedNovoGrad(torch.optim.Optimizer):
...
@@ -20,12 +21,13 @@ class FusedNovoGrad(torch.optim.Optimizer):
:class:`apex.optimizers.FusedNovoGrad` may be used with or without Amp. If you wish to use :class:`FusedNovoGrad` with Amp,
:class:`apex.optimizers.FusedNovoGrad` may be used with or without Amp. If you wish to use :class:`FusedNovoGrad` with Amp,
you may choose any `opt_level`::
you may choose any `opt_level`::
opt = apex.optimizers.FusedNovoGrad(model.parameters(), lr = ....)
opt = apex.optimizers.FusedNovoGrad(model.parameters(), lr = ....)
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
...
...
opt.step()
opt.step()
In general, `opt_level="O1"` is recommended.
In general,
`
`opt_level="O1"`
`
is recommended.
It has been proposed in `Jasper: An End-to-End Convolutional Neural Acoustic Model`_.
It has been proposed in `Jasper: An End-to-End Convolutional Neural Acoustic Model`_.
More info: https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html#novograd
More info: https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html#novograd
...
...
apex/optimizers/fused_sgd.py
View file @
17e8a552
...
@@ -6,14 +6,15 @@ from apex.multi_tensor_apply import multi_tensor_applier
...
@@ -6,14 +6,15 @@ from apex.multi_tensor_apply import multi_tensor_applier
class
FusedSGD
(
Optimizer
):
class
FusedSGD
(
Optimizer
):
r
"""Implements stochastic gradient descent (optionally with momentum).
r
"""Implements stochastic gradient descent (optionally with momentum).
Currently GPU-only. Requires Apex to be installed via
Currently GPU-only. Requires Apex to be installed via
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
``pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./``.
This version of fused SGD implements 2 fusions:
This version of fused SGD implements 2 fusions.
- Fusion of the SGD update's elementwise operations
- A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedSGD` may be used as a drop-in replacement for torch.optim.SGD::
* Fusion of the SGD update's elementwise operations
* A multi-tensor apply launch that batches the elementwise updates applied to all the model's parameters into one or a few kernel launches.
:class:`apex.optimizers.FusedSGD` may be used as a drop-in replacement for ``torch.optim.SGD``::
opt = apex.optimizers.FusedSGD(model.parameters(), lr = ....)
opt = apex.optimizers.FusedSGD(model.parameters(), lr = ....)
...
...
...
@@ -21,12 +22,13 @@ class FusedSGD(Optimizer):
...
@@ -21,12 +22,13 @@ class FusedSGD(Optimizer):
:class:`apex.optimizers.FusedSGD` may be used with or without Amp. If you wish to use :class:`FusedSGD` with Amp,
:class:`apex.optimizers.FusedSGD` may be used with or without Amp. If you wish to use :class:`FusedSGD` with Amp,
you may choose any `opt_level`::
you may choose any `opt_level`::
opt = apex.optimizers.FusedSGD(model.parameters(), lr = ....)
opt = apex.optimizers.FusedSGD(model.parameters(), lr = ....)
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
model, opt = amp.initialize(model, opt, opt_level="O0" or "O1 or "O2")
...
...
opt.step()
opt.step()
In general, `opt_level="O1"` is recommended.
In general,
`
`opt_level="O1"`
`
is recommended.
Nesterov momentum is based on the formula from
Nesterov momentum is based on the formula from
`On the importance of initialization and momentum in deep learning`__.
`On the importance of initialization and momentum in deep learning`__.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment