Commits · c8f9cceba7a2daafb30c1c87db2764dabfb2ee6e · OpenDAS / apex

16 Aug, 2019 1 commit
- add fused lamb, put lamb kernels into one file · c8f9cceb
  Deyu Fu authored Aug 16, 2019
  
  c8f9cceb
08 Aug, 2019 1 commit
- initial commit to make fused optimizers compatible with AMP · 690b1f71
  Deyu Fu authored Aug 08, 2019
  
  690b1f71
06 Aug, 2019 1 commit

Clean up layer norm tests (#418) · 3ef01fae

ngimel authored Aug 06, 2019

* Bug fix for non-affine layer-norm + add backward unit test

* clean up tests and add tests for a large batch

3ef01fae

01 Aug, 2019 1 commit
- fix fused layer norm for >65535 batch · 4a8e1a87
  Natalia Gimelshein authored Aug 01, 2019
  
  4a8e1a87
26 Jul, 2019 1 commit
- Add missing semicolon. (#390) · 3f7f5fba
  Edward Z. Yang authored Jul 12, 2019
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
```
  3f7f5fba
12 Jul, 2019 1 commit
- Add missing semicolon. (#390) · 80e0143e
  Edward Z. Yang authored Jul 12, 2019
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
```
  80e0143e
03 Jul, 2019 4 commits
- Pulling in deprecation warning changes · 665b2dd7
  Michael Carilli authored Jul 03, 2019
  
  665b2dd7
- Remove deprecated Type.h · 816813f9
  Michael Carilli authored Jul 03, 2019
  
  816813f9
- Remove deprecated Type.h · 7096b1b7
  Michael Carilli authored Jul 03, 2019
  
  7096b1b7
- Changing AT_CHECK to TORCH_CHECK · adee29f6
  Michael Carilli authored Jul 03, 2019
  
  adee29f6
28 Jun, 2019 1 commit
- Add support for fp16 update term (new UPD_T typename in template) · 3aeea0d8
  Thor Johnsen authored Jun 28, 2019
  
  3aeea0d8
14 Jun, 2019 1 commit
- Separate LDG/STG from compute loop (#359) · 121a2500
  Thor Johnsen authored Jun 13, 2019
  
  121a2500
11 Jun, 2019 1 commit
- Allow multi_tensor_lamb to update fp16 params · 47e3367f
  Michael Carilli authored Jun 11, 2019
  
  47e3367f
31 May, 2019 2 commits

Multi tensor lamb optimizer (#334) · 8be5b6be

Thor Johnsen authored May 31, 2019

* First draft, for discussion

* Fix mistakes in LAMB equations

* Add loop over chunk

* Bug fix

* Bug fix

* Bug fix

* Undo bug fix

* Bug fix

* Add multi tensor LAMB optimizer to setup.py

* Rename step_size to learning_rate

* Fix compilation errors

8be5b6be

Give multi-tensor L2 norm the ability to compute norms per-tensor as well as globally (#333) · 93338e62

mcarilli authored May 31, 2019

* Existing tests passing, still need to add per-tensor tests

* Test is passing, still need to measure performance

* ILP for l2norm functor

93338e62

27 May, 2019 1 commit
- FusedSGD tests passing for all opt_levels · 848c777d
  Michael Carilli authored May 27, 2019
  
  848c777d
10 May, 2019 1 commit
- materialize_master_weights for FusedSGD · c763f0fe
  Michael Carilli authored May 09, 2019
  
  c763f0fe
03 May, 2019 1 commit
- Converting dispatch macros in fused_adam_cuda_kernel.cu · f3528d99
  Michael Carilli authored May 03, 2019
  
  f3528d99
27 Apr, 2019 1 commit

Bnp integration pr (#275) · fedfe0d7

jjsjann123 authored Apr 26, 2019

* Persistent group batchnorm added

Added persistent grouped batch norm for performance run on strong scaling case:
currently only supporting:

  1. nhwc layout
  2. fp16
  3. synchronization only within a node!

Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
by the persistent kernel.

Documentation and examples will follow.

* updating type().scalarType() to scalar_type()

* moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm

* fixing the cta computation

* review comment:

set device_id through cudaGetDevice()
move cudaMemset to cudaMemsetAsync
updated __threadfence() to __threadfence_system() inter device write

fedfe0d7

26 Apr, 2019 5 commits

Removing instances of ScalarType, still need to change macros · d175acb0
Michael Carilli authored Apr 26, 2019

d175acb0
whitespace · c978bda5
Michael Carilli authored Apr 26, 2019

c978bda5

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

Tested on 1x8x1 · 070c7e96
Michael Carilli authored Apr 26, 2019

070c7e96
Fixed bounds checking · 3b32c401
Michael Carilli authored Apr 26, 2019

3b32c401

25 Apr, 2019 1 commit
- let's see · 75139ca3
  Michael Carilli authored Apr 25, 2019
  
  75139ca3
22 Apr, 2019 1 commit
- Updating TensorList->TensorListMetadata · 16a3bdf3
  Michael Carilli authored Apr 22, 2019
  
  16a3bdf3
18 Apr, 2019 1 commit
- cleanup · 651150cb
  Michael Carilli authored Apr 18, 2019
  
  651150cb
12 Apr, 2019 1 commit
- Update Wil's code + typo · 53fd093d
  Michael Carilli authored Apr 12, 2019
  
  53fd093d
10 Apr, 2019 2 commits
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
19 Mar, 2019 2 commits
- Fixing interaction of DDP with dynamic loss scaling · 8437d295
  Michael Carilli authored Mar 19, 2019
  
  8437d295
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
15 Mar, 2019 1 commit
- Anticipating upstream #17996 · 2c8e1c86
  Michael Carilli authored Mar 15, 2019
  
  2c8e1c86
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
11 Mar, 2019 1 commit
- Fix momentum initialization with weight decay · 724672d7
  Simon Layton authored Mar 11, 2019
  
  724672d7