Commits · c142714ba4a3485036aab2e0ef9d87aa67827d46 · OpenDAS / apex

23 May, 2019 3 commits
- Merging in latest master changes · c142714b
  Michael Carilli authored May 23, 2019
  
  c142714b
- Merge branch 'integrate_label_smoothing' into multi_tensor_sgd · b620f96b
  Michael Carilli authored May 23, 2019
  
  b620f96b
- Changing error message · e6eec3ba
  Michael Carilli authored May 23, 2019
  
  e6eec3ba
22 May, 2019 3 commits
- Hard error on Pytorch Cuda + Cuda toolkit version mismatch (#323) · 50689f6a
  mcarilli authored May 22, 2019
  
  50689f6a
- Fixing second line for 321. · ccffa71c
  Michael Carilli authored May 22, 2019
  
  ccffa71c
- use value in assert statement (#321) · 9bd61cc1
  ptrblck authored May 22, 2019
  
  9bd61cc1
21 May, 2019 1 commit

Enable LARC for use with amp (#306) · c490bd36

blisc authored May 20, 2019



* update larc
Signed-off-by: Jason <jasoli@nvidia.com>

* scale_loss fix
Signed-off-by: Jason <jasoli@nvidia.com>

* typo
Signed-off-by: Jason <jasoli@nvidia.com>

* revert LARC

c490bd36

17 May, 2019 2 commits

[syncbn update] (#287) · a5289067

jjsjann123 authored May 17, 2019

update input size check to fix github issue #262

update SyncBatchNorm count check so that size 1 input with cross GPU
synchronization runs fine.

a5289067

[SyncBatchNorm update] (#285) · ffbb52ba

jjsjann123 authored May 17, 2019

resolves issue #254

Added input casting for pure python implementation, this supports mismatched
input and layer dtype.

ffbb52ba

16 May, 2019 1 commit
- Support add_param_group (#310) · 4d325d2f
  mcarilli authored May 15, 2019
```
* Support add_param_group

* syntax

* Test added and passing
```
  4d325d2f
15 May, 2019 3 commits
- use verbose parameter to control print of grad overflow (#300) · cfb628ba
  Michael Glass authored May 15, 2019
  
  cfb628ba
- raise exception if cudnn is disabled (#305) · a3169768
  ptrblck authored May 15, 2019
  
  a3169768
- fix URLs in docs of apex.parallel (#309) · df099a4b
  ptrblck authored May 15, 2019
```
* fix URLs

* Update distributed.py
```
  df099a4b
13 May, 2019 3 commits
- Adding docker build test for 1.1 container · f2b3a62c
  Michael Carilli authored May 13, 2019
  
  f2b3a62c
- Progress towards materialize_master_grads=False · 8ae63102
  Michael Carilli authored May 13, 2019
  
  8ae63102
- Fix for #302 · 54b8a852
  mcarilli authored May 13, 2019
  
  54b8a852
10 May, 2019 1 commit
- materialize_master_weights for FusedSGD · c763f0fe
  Michael Carilli authored May 09, 2019
  
  c763f0fe
09 May, 2019 2 commits

Fix link to distributed samples (#298) · 4ff153cd
Tim Zaman authored May 09, 2019

4ff153cd

Add softmax cross entropy loss with label smoothing support. (#295) · 0c74571f

Wil Kong authored May 10, 2019

* Add softmax cross entropy loss with label smoothing support.

* Fix deprecation of AT_DISPATCH_XXX and several minor issues.

* Fix issues commented by reviewers.

* Add FB license.

* Remove code generation constraints.

* Add a simple unittest for label smoothing.

0c74571f

08 May, 2019 1 commit
- Checkpoint fix · 3e2883dd
  Michael Carilli authored May 07, 2019
  
  3e2883dd
03 May, 2019 1 commit
- Converting dispatch macros in fused_adam_cuda_kernel.cu · f3528d99
  Michael Carilli authored May 03, 2019
  
  f3528d99
02 May, 2019 2 commits
- Adding test_fused_sgd.py · d0505433
  Michael Carilli authored May 02, 2019
  
  d0505433
- test_fused_sgd.py passing · 00dbe4b4
  Michael Carilli authored May 02, 2019
  
  00dbe4b4
01 May, 2019 3 commits
- allreduce_different_streams is now hidden · 72bce160
  Michael Carilli authored May 01, 2019
  
  72bce160
- Fixing syntax · 3b4a0a23
  Michael Carilli authored May 01, 2019
  
  3b4a0a23
- Option to preinitialize allreduce communicators · 71957836
  Michael Carilli authored May 01, 2019
  
  71957836
30 Apr, 2019 5 commits
- Clarifying docker launch · 39e153a3
  Michael Carilli authored Apr 30, 2019
  
  39e153a3
- resolving delete conflicts · 86bd6c79
  Michael Carilli authored Apr 30, 2019
  
  86bd6c79
- Remove deprecated examples and update Docker guidance · d2ac4872
  Michael Carilli authored Apr 30, 2019
  
  d2ac4872
- Remove unused tensor in fast_collate (#281) · ca2baffb
  ptrblck authored Apr 30, 2019
```
* remove unused tens tensor in example/imagenet/main_amp.py

* remove unused tens tensor in deprecated examples and tests/L1
```
  ca2baffb
- Casting logic should reflatten RNN parameters · 03a25ba8
  mcarilli authored Apr 29, 2019
  
  03a25ba8
29 Apr, 2019 2 commits
- Adding warning for amp.scale_loss · 1b8303d8
  Michael Carilli authored Apr 29, 2019
  
  1b8303d8
- Warning for inception_v3 · 7b245dba
  Michael Carilli authored Apr 29, 2019
  
  7b245dba
27 Apr, 2019 2 commits

Bnp integration pr (#275) · fedfe0d7

jjsjann123 authored Apr 26, 2019

* Persistent group batchnorm added

Added persistent grouped batch norm for performance run on strong scaling case:
currently only supporting:

  1. nhwc layout
  2. fp16
  3. synchronization only within a node!

Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
by the persistent kernel.

Documentation and examples will follow.

* updating type().scalarType() to scalar_type()

* moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm

* fixing the cta computation

* review comment:

set device_id through cudaGetDevice()
move cudaMemset to cudaMemsetAsync
updated __threadfence() to __threadfence_system() inter device write

fedfe0d7

syntax · e7beba17
Michael Carilli authored Apr 27, 2019

e7beba17

26 Apr, 2019 5 commits

Removing instances of ScalarType, still need to change macros · d175acb0
Michael Carilli authored Apr 26, 2019

d175acb0
Merging in master · d900e93c
Michael Carilli authored Apr 26, 2019

d900e93c
whitespace · c978bda5
Michael Carilli authored Apr 26, 2019

c978bda5
Explicit control over number of allreduce streams for DDP · 73d4212d
Michael Carilli authored Apr 26, 2019

73d4212d

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3