Commits · 111ee132dadfdf666b30da37910a74f4d553ebff · OpenDAS / apex

16 Apr, 2019 2 commits
- Adding control point for scale adjustment · 111ee132
  Michael Carilli authored Apr 16, 2019
  
  111ee132
- Adding option to ensure that model outputs are a desired type · d69011de
  Michael Carilli authored Apr 16, 2019
  
  d69011de
15 Apr, 2019 3 commits
- fp16_groups is an attribute of _amp_stash · eea4c0aa
  Michael Carilli authored Apr 15, 2019
  
  eea4c0aa
- Scaler not needed for prepare_backward*fused · e5213b28
  Michael Carilli authored Apr 15, 2019
  
  e5213b28
- For testing purposes, enable the case where FusedAdam is not wrapped by amp · 5ae6008d
  Michael Carilli authored Apr 15, 2019
  
  5ae6008d
12 Apr, 2019 1 commit
- Update Wil's code + typo · 53fd093d
  Michael Carilli authored Apr 12, 2019
  
  53fd093d
11 Apr, 2019 7 commits
- Merge branch 'master' into prepare_fused · 3c53cf81
  Michael Carilli authored Apr 11, 2019
  
  3c53cf81
- typo · b7f10ad0
  Michael Carilli authored Apr 11, 2019
  
  b7f10ad0
- Patching in changes to enable multiple allreduces in flight · 8521bb22
  Michael Carilli authored Apr 11, 2019
  
  8521bb22
- Rough cut, control flow should work for scaleout testing · 61b8a0fd
  Michael Carilli authored Apr 11, 2019
  
  61b8a0fd
- prelu belongs in FP16_CASTS (#257) · 4dc711bc
  henrymai authored Apr 11, 2019
```
The main use of these functions (e.g.: `torch.{conv*, prelu}`) is via their `torch.nn`
wrapping layers.

The `torch.nn` layers are what contain the weights and call into these lower level
functions with the weights as a parameter in their `forward()` method.

The `torch.conv*` functions are already in the `FP16_CASTS` list due to amp's philosophy of
casting the parameters rather than the model/layer weights.

Conceptually `torch.prelu` is the same as the `torch.conv*` case, where its weight parameter
is passed in from its wrapper layer `torch.nn.PReLU`.
```
  4dc711bc
- Fixing merge conflict in setup.py · dda59354
  Michael Carilli authored Apr 10, 2019
  
  dda59354
- some cleanup · fc6c5a25
  Michael Carilli authored Apr 10, 2019
  
  fc6c5a25
10 Apr, 2019 5 commits
- Merge pull request #256 from LamDang/master · 2c18651b
  ngimel authored Apr 10, 2019
```
quick fix: make FusedLayerNorm compatible with cpu
```
  2c18651b
- add new tests to run_test.py · 6d40465a
  Lam Dang authored Apr 10, 2019
  
  6d40465a
- quick fix: make FusedLayerNorm compatible with cpu · d130ec1f
  Lam Dang authored Apr 10, 2019
  
  d130ec1f
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
05 Apr, 2019 3 commits
- Merge branch 'master' of https://github.com/NVIDIA/apex · e9bbfa59
  Michael Carilli authored Apr 04, 2019
  
  e9bbfa59
- docstring · 2eccdbd2
  Michael Carilli authored Apr 04, 2019
  
  2eccdbd2
- delay_unscale is never necessary and generally discouraged, but should still work for some cases · 0750a757
  Michael Carilli authored Apr 04, 2019
  
  0750a757
04 Apr, 2019 3 commits

Merge pull request #241 from mkolod/fp32_interp · dc8f400f
ngimel authored Apr 04, 2019
```
Run interpolation in fp32 because it's faster
```
dc8f400f
Run interpolation in fp32 because it's faster · 47e9cae5
Marek Kolodziej authored Apr 04, 2019

47e9cae5

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

03 Apr, 2019 1 commit
- Allow verbose casting information for O1 · 214fda42
  mcarilli authored Apr 03, 2019
  
  214fda42
01 Apr, 2019 1 commit
- Merge pull request #233 from DTennant/patch-1 · 84028786
  jjsjann123 authored Apr 01, 2019
```
Fix a typo in optimized_sync_batchnorm_kernel.py
```
  84028786
31 Mar, 2019 1 commit
- Update optimized_sync_batchnorm_kernel.py · 9b114c15
  Bingchen Zhao authored Mar 31, 2019
```
in line 54, running_var should be running_variance..
```
  9b114c15
27 Mar, 2019 2 commits
- Merge pull request #225 from NVIDIA/bmm-fp16 · a8c2b7dd
  ngimel authored Mar 27, 2019
```
Conditionally run bmm functions in fp16 based on cuda version
```
  a8c2b7dd
- Conditionally run bmm functions in fp16 based on cuda version · f1123e32
  Carl Case authored Mar 27, 2019
  
  f1123e32
26 Mar, 2019 2 commits
- Minor docstring updates · f5cd5ae9
  Michael Carilli authored Mar 26, 2019
  
  f5cd5ae9
- Add some links · e7f19560
  Michael Carilli authored Mar 26, 2019
  
  e7f19560
23 Mar, 2019 1 commit
- Fix typo in setup.py error message on torch version check (#219) · dc55a996
  Cubbee authored Mar 23, 2019
  
  dc55a996
22 Mar, 2019 4 commits

[SyncBatchNorm] (#206) · 0a991543

jjsjann123 authored Mar 22, 2019

supporting 2 dimensional input, resolving issue #194

Implementation:
  for 2d input, switching channel_last flag to true for better memory access
pattern in the kernel.

0a991543

Add prelu to list of torch overrides (#217) · 570fde70

henrymai authored Mar 22, 2019

* Add prelu to list of torch overrides

This is to fix the following error:

  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 722, in forward
    return F.prelu(input, self.weight)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1040, in prelu
    return torch.prelu(input, weight)
RuntimeError: expected scalar type Half but found Float

* Update torch_overrides.py

570fde70

Fix 'local variable 'optimizers_was_list' referenced before assignment' when... · ba429e51

enricoschroeder authored Mar 22, 2019

Fix 'local variable 'optimizers_was_list' referenced before assignment' when amp.initialize() is called with optimizers=None (#218)

ba429e51

Check cuda version (#216) · 5b8faa29

mcarilli authored Mar 21, 2019

* Adding Torch + bare-metal nvcc version check and container build tests

* Putting a canary in the coalmine

* canary proved elusive

* Trying direct setup.py install

* this should work

* Removing canary

* hopefully this works

5b8faa29

21 Mar, 2019 2 commits
- Merge pull request #169 from NVIDIA/intlist-intarrayref · 6e5d9099
  mcarilli authored Mar 21, 2019
```
Rename IntList to IntArrayRef
```
  6e5d9099
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0