Commits · de6378f5dae8fcf2879a4be8ecea8bbcb9e59d53 · OpenDAS / apex

27 Feb, 2020 1 commit
- NHWC support for multi tensor apply (#732) · de6378f5
  mcarilli authored Feb 26, 2020
```
* NHWC support for multi tensor apply

* compilation fix for version<=1.4
```
  de6378f5
04 Oct, 2019 1 commit

move previous fused_adam and fp16_optimizer to contrib (#517) · 1904e48d

Deyu Fu authored Oct 04, 2019

* move previous fused_adam and fp16_optimizer to contrib

* make build contrib.fused_adam optional

* change build option name

* remove unnecessary try import

1904e48d

06 Sep, 2019 1 commit

Fix for #456 (#477) · 325f5a0b

mcarilli authored Sep 05, 2019

* Pushing for build tests

* Contrib files

* Removing deprecated checks

325f5a0b

20 Aug, 2019 1 commit
- add back lamb stage1/2 to amp_C python · b9f0995b
  Deyu Fu authored Aug 20, 2019
  
  b9f0995b
17 Aug, 2019 1 commit
- add back legacy lamb code for backward comptibility now · 2bc766ce
  Deyu Fu authored Aug 16, 2019
  
  2bc766ce
16 Aug, 2019 2 commits

clean up variance options support by all fused optimizers: · 18062b69

Deyu Fu authored Aug 16, 2019

correctly not apply bias correction to epsilon(same as recent upstream change)
correctly not apply bias correction to weight decay(consistent with upstream AdamW)
Make adam_w_mode for FusedAdam/LAMB, to do L2 or Weight Decay (Adam vs AdamW)
Correct document reg_inside_moment differently from adam_w_mode in FusedNovoGrad
Removed legacy eps_mode from FusedAdam
Make internal math type float across fused optimizers

18062b69

add fused lamb, put lamb kernels into one file · c8f9cceb
Deyu Fu authored Aug 16, 2019

c8f9cceb

08 Aug, 2019 1 commit
- initial commit to make fused optimizers compatible with AMP · 690b1f71
  Deyu Fu authored Aug 08, 2019
  
  690b1f71
06 Aug, 2019 1 commit

Clean up layer norm tests (#418) · 3ef01fae

ngimel authored Aug 06, 2019

* Bug fix for non-affine layer-norm + add backward unit test

* clean up tests and add tests for a large batch

3ef01fae

01 Aug, 2019 1 commit
- fix fused layer norm for >65535 batch · 4a8e1a87
  Natalia Gimelshein authored Aug 01, 2019
  
  4a8e1a87
26 Jul, 2019 1 commit
- Add missing semicolon. (#390) · 3f7f5fba
  Edward Z. Yang authored Jul 12, 2019
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
```
  3f7f5fba
12 Jul, 2019 1 commit
- Add missing semicolon. (#390) · 80e0143e
  Edward Z. Yang authored Jul 12, 2019
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
```
  80e0143e
03 Jul, 2019 4 commits
- Pulling in deprecation warning changes · 665b2dd7
  Michael Carilli authored Jul 03, 2019
  
  665b2dd7
- Remove deprecated Type.h · 816813f9
  Michael Carilli authored Jul 03, 2019
  
  816813f9
- Remove deprecated Type.h · 7096b1b7
  Michael Carilli authored Jul 03, 2019
  
  7096b1b7
- Changing AT_CHECK to TORCH_CHECK · adee29f6
  Michael Carilli authored Jul 03, 2019
  
  adee29f6
28 Jun, 2019 1 commit
- Add support for fp16 update term (new UPD_T typename in template) · 3aeea0d8
  Thor Johnsen authored Jun 28, 2019
  
  3aeea0d8
14 Jun, 2019 1 commit
- Separate LDG/STG from compute loop (#359) · 121a2500
  Thor Johnsen authored Jun 13, 2019
  
  121a2500
11 Jun, 2019 1 commit
- Allow multi_tensor_lamb to update fp16 params · 47e3367f
  Michael Carilli authored Jun 11, 2019
  
  47e3367f
31 May, 2019 2 commits

Multi tensor lamb optimizer (#334) · 8be5b6be

Thor Johnsen authored May 31, 2019

* First draft, for discussion

* Fix mistakes in LAMB equations

* Add loop over chunk

* Bug fix

* Bug fix

* Bug fix

* Undo bug fix

* Bug fix

* Add multi tensor LAMB optimizer to setup.py

* Rename step_size to learning_rate

* Fix compilation errors

8be5b6be

Give multi-tensor L2 norm the ability to compute norms per-tensor as well as globally (#333) · 93338e62

mcarilli authored May 31, 2019

* Existing tests passing, still need to add per-tensor tests

* Test is passing, still need to measure performance

* ILP for l2norm functor

93338e62

27 May, 2019 1 commit
- FusedSGD tests passing for all opt_levels · 848c777d
  Michael Carilli authored May 27, 2019
  
  848c777d
10 May, 2019 1 commit
- materialize_master_weights for FusedSGD · c763f0fe
  Michael Carilli authored May 09, 2019
  
  c763f0fe
03 May, 2019 1 commit
- Converting dispatch macros in fused_adam_cuda_kernel.cu · f3528d99
  Michael Carilli authored May 03, 2019
  
  f3528d99
27 Apr, 2019 1 commit

Bnp integration pr (#275) · fedfe0d7

jjsjann123 authored Apr 26, 2019

* Persistent group batchnorm added

Added persistent grouped batch norm for performance run on strong scaling case:
currently only supporting:

  1. nhwc layout
  2. fp16
  3. synchronization only within a node!

Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
by the persistent kernel.

Documentation and examples will follow.

* updating type().scalarType() to scalar_type()

* moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm

* fixing the cta computation

* review comment:

set device_id through cudaGetDevice()
move cudaMemset to cudaMemsetAsync
updated __threadfence() to __threadfence_system() inter device write

fedfe0d7

26 Apr, 2019 5 commits

Removing instances of ScalarType, still need to change macros · d175acb0
Michael Carilli authored Apr 26, 2019

d175acb0
whitespace · c978bda5
Michael Carilli authored Apr 26, 2019

c978bda5

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

Tested on 1x8x1 · 070c7e96
Michael Carilli authored Apr 26, 2019

070c7e96
Fixed bounds checking · 3b32c401
Michael Carilli authored Apr 26, 2019

3b32c401

25 Apr, 2019 1 commit
- let's see · 75139ca3
  Michael Carilli authored Apr 25, 2019
  
  75139ca3
22 Apr, 2019 1 commit
- Updating TensorList->TensorListMetadata · 16a3bdf3
  Michael Carilli authored Apr 22, 2019
  
  16a3bdf3
18 Apr, 2019 1 commit
- cleanup · 651150cb
  Michael Carilli authored Apr 18, 2019
  
  651150cb
12 Apr, 2019 1 commit
- Update Wil's code + typo · 53fd093d
  Michael Carilli authored Apr 12, 2019
  
  53fd093d
10 Apr, 2019 2 commits
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

21 Mar, 2019 1 commit
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0