Commits · 2a4864d574e17272a72f2a39d72bbe59bc90989b · OpenDAS / apex

05 Aug, 2022 1 commit

Hubert Lu authored Aug 05, 2022



* FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274)

* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

* fix and generate docs for FusedRMSNorm (#1285)

* [FusedRMSNorm doc] document where epsilon is added (#1295)

* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording

* Fix some bugs

* Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs

* Fix NaN issues in FusedRMSNorm

* Update test_fused_layer_norm.py

* Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm

* Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProperties()->warpSize
Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

c97ebfab

07 Jul, 2022 1 commit
- Remove `pyprof` and `reparameterization` (#1404) · 8a7a3325
  Masaki Kozuki authored Jul 06, 2022
```
* remove pyprof

* remove reparameterization

* remove pyprof test

* clean up
```
  8a7a3325
15 Apr, 2022 1 commit
- fix and generate docs for FusedRMSNorm (#1285) · fceec07d
  eqy authored Feb 07, 2022
  
  fceec07d
07 Feb, 2022 1 commit
- fix and generate docs for FusedRMSNorm (#1285) · a786ca0c
  eqy authored Feb 07, 2022
  
  a786ca0c
10 Oct, 2019 1 commit
- Adding presentation link to sphinx landing page · 08898593
  Michael Carilli authored Oct 10, 2019
  
  08898593
27 Aug, 2019 2 commits

Enable Checkpointing (#420) · dec4fdd6

ptrblck authored Aug 27, 2019

* add state_dict, load_state_dict

* add test_restoring, test_loss_scale_decrease

* disable amp outputs for checkpoint tests

* add test for amp.state_dict, cleanup

* add state_dict patch, add test

* fixed testing, cleanup

* add readme for checkpointing

* add docs to source/amp

* add review changes to doc

dec4fdd6

Updating docstrings for fused optimizers · 427e82cd
Michael Carilli authored Aug 27, 2019

427e82cd

27 Jun, 2019 1 commit
- Clarifying documentation on gradient accumulation · 18f2eaee
  Michael Carilli authored Jun 26, 2019
  
  18f2eaee
24 Jun, 2019 2 commits
- Docstring for multiple losses · f8557569
  Michael Carilli authored Jun 24, 2019
  
  f8557569
- Updating gradient accumulation guidance · ca35aa79
  Michael Carilli authored Jun 24, 2019
  
  ca35aa79
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

20 Mar, 2019 1 commit
- Adding documentation on custom batch casting · bd0db55e
  Michael Carilli authored Mar 19, 2019
  
  bd0db55e
13 Mar, 2019 1 commit
- Casting model output as well as input, for #195 · d1f74a3e
  Michael Carilli authored Mar 12, 2019
  
  d1f74a3e
12 Mar, 2019 2 commits
- Clarifying docs on gradient clipping · 895c5167
  Michael Carilli authored Mar 12, 2019
  
  895c5167
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
11 Mar, 2019 1 commit
- Fix #193 · 975ed322
  Michael Carilli authored Mar 11, 2019
  
  975ed322
07 Mar, 2019 4 commits
- More rearrangement · 533e88d7
  Michael Carilli authored Mar 07, 2019
  
  533e88d7
- Rearranging documentation · 4606df98
  Michael Carilli authored Mar 07, 2019
  
  4606df98
- Adding advanced.rst · 0c2a629d
  Michael Carilli authored Mar 06, 2019
  
  0c2a629d
- Converging · 6644c6e6
  Michael Carilli authored Mar 06, 2019
  
  6644c6e6
06 Mar, 2019 1 commit
- Documentation updates · d44ce75a
  Michael Carilli authored Mar 06, 2019
  
  d44ce75a
05 Mar, 2019 1 commit
- Documentation updates · 7f39db93
  Michael Carilli authored Mar 04, 2019
  
  7f39db93
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
06 Feb, 2019 1 commit
- Some documentation cleanup · b2f63c48
  Michael Carilli authored Feb 06, 2019
  
  b2f63c48
01 Feb, 2019 1 commit
- Stashing work · a9a3fe57
  Michael Carilli authored Feb 01, 2019
  
  a9a3fe57
12 Dec, 2018 1 commit
- Warning instead of error if nvcc is not found · 197bcc48
  Michael Carilli authored Dec 12, 2018
  
  197bcc48
28 Nov, 2018 2 commits
- Shortening import path for layernorm · 98b76a86
  Michael Carilli authored Nov 28, 2018
  
  98b76a86
- Adding layernorm docs · 67ad3065
  Michael Carilli authored Nov 28, 2018
  
  67ad3065
30 Oct, 2018 1 commit
- Updating documentation for merged utilities · 8124fba2
  Michael Carilli authored Oct 30, 2018
  
  8124fba2
23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef

28 Aug, 2018 1 commit
- Cleaning up git weirdness + updating docs for Reducer · 37a4b221
  Michael Carilli authored Aug 28, 2018
  
  37a4b221
20 Jun, 2018 1 commit
- Readme updates + version in Sphinx · adaa9137
  Michael Carilli authored Jun 19, 2018
  
  adaa9137
16 Jun, 2018 2 commits
- Adding sphinx rst for amp · ed98834d
  Michael Carilli authored Jun 15, 2018
  
  ed98834d
- README wiring in a reasonable state, Sphinx docstrings updated · c41d9f2b
  Michael Carilli authored Jun 15, 2018
  
  c41d9f2b
15 Jun, 2018 2 commits
- More docstring + README updates · 5f8c3183
  Michael Carilli authored Jun 15, 2018
  
  5f8c3183
- Updating READMEs and examples · 82d7a3bf
  Michael Carilli authored Jun 15, 2018
  
  82d7a3bf
14 Jun, 2018 1 commit
- Rewiring READMEs · 2a2341c7
  Michael Carilli authored Jun 14, 2018
  
  2a2341c7
08 May, 2018 1 commit
- Remove stub sentence in doc · 3ab105d9
  Christian Sarofeen authored May 08, 2018
  
  3ab105d9
25 Apr, 2018 2 commits
- Cleaned comments in fp16_utils and csrc. Keeping comments that are non-docstring but informative. · a3e2776a
  Michael Carilli authored Apr 25, 2018
  
  a3e2776a
- Initial release · 2fa4dbaf
  Christian Sarofeen authored Apr 25, 2018
  
  2fa4dbaf