- 12 Jun, 2023 1 commit
-
-
flyingdown authored
2.添加环境变量APEX_ROCBLAS_GEMM_ALLOW_HALF用于控制是否使用fp16r 3.添加dcu版本信息 whl包名修改 readme更新安装步骤
-
- 07 Jul, 2022 1 commit
-
-
Masaki Kozuki authored
* remove pyprof * remove reparameterization * remove pyprof test * clean up
-
- 18 Mar, 2022 1 commit
-
-
eqy authored
* update ngc link and dockerhub container tag * update * update * update * Update README.md Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
- 08 Mar, 2022 2 commits
-
-
Masaki Kozuki authored
This reverts commit 74e04667.
-
Masaki Kozuki authored
-
- 09 Dec, 2021 1 commit
-
-
Hubert Lu authored
-
- 29 Oct, 2021 1 commit
-
-
Peng authored
-
- 10 Feb, 2021 1 commit
-
-
Shoufa Chen authored
* copy-paste friendly * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * fix import container_abcs issue Nightly PyTorch has removed `container_abcs` from `torch._six`. https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35 * keep existing for pytorch1.7 and earlier
-
- 16 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 15 Dec, 2020 1 commit
-
-
lcskrishna authored
-
- 04 Dec, 2020 1 commit
-
-
Stas Bekman authored
-
- 21 Aug, 2020 1 commit
-
-
Chaitanya Sri Krishna Lolla authored
-
- 01 Jun, 2020 1 commit
-
-
mcarilli authored
Co-authored-by:Michael Carilli <mcarilli@nvidia.com>
-
- 29 May, 2020 2 commits
-
-
Chaitanya Sri Krishna Lolla authored
-
lcskrishna authored
-
- 10 Oct, 2019 1 commit
-
-
mcarilli authored
-
- 08 Oct, 2019 1 commit
-
-
Jan Schlüter authored
-
- 27 Aug, 2019 1 commit
-
-
ptrblck authored
* add state_dict, load_state_dict * add test_restoring, test_loss_scale_decrease * disable amp outputs for checkpoint tests * add test for amp.state_dict, cleanup * add state_dict patch, add test * fixed testing, cleanup * add readme for checkpointing * add docs to source/amp * add review changes to doc
-
- 13 Aug, 2019 1 commit
-
-
Marek Kolodziej authored
Co-authored-by:
Aditya Agrawal <aditya.iitb@gmail.com> Co-authored-by:
Marek Kolodziej <mkolod@gmail.com>
-
- 24 Jun, 2019 1 commit
-
-
mcarilli authored
-
- 09 May, 2019 1 commit
-
-
Tim Zaman authored
-
- 30 Apr, 2019 1 commit
-
-
Michael Carilli authored
-
- 18 Apr, 2019 1 commit
-
-
Glenn Jocher authored
-
- 11 Apr, 2019 1 commit
-
-
Michael Carilli authored
-
- 12 Mar, 2019 1 commit
-
-
mcarilli authored
-
- 07 Mar, 2019 2 commits
-
-
Michael Carilli authored
-
Michael Carilli authored
-
- 04 Mar, 2019 1 commit
-
-
Michael Carilli authored
-
- 01 Mar, 2019 2 commits
- 28 Feb, 2019 1 commit
-
-
vfdev authored
-
- 20 Feb, 2019 5 commits
- 28 Jan, 2019 1 commit
-
-
mcarilli authored
-
- 31 Oct, 2018 1 commit
-
-
Thor Johnsen authored
* Pre-release of fused layer norm apex extension * Remove half and __half2 specializations * Code changes from review
-
- 30 Oct, 2018 1 commit
-
-
Michael Carilli authored
-
- 23 Oct, 2018 1 commit
-
-
jjsjann123 authored
* [syncBN] added syncBN in native pure python apex added fused cuda kernels used for sync BN. Using welford for mean/var optional installation using 'python setup.py install --cuda_ext' added unit test with side to side comparison between apex sync BN with PyTorch BN. Notice that for pytorch BN implementation, because of numerical issue for mean/var, the output will be slightly off. * [syncBN PR] added fp16 support addressing review comments on: 1. updating last pow 2 2. look for import error when importing syncBN kernel * [syncBN PR] added convert function to insert SyncBatchNorm refactored some kernel code * fixing type issue (fp16/fp32/fp64) added Kahan summation editing unit test to use pytorch primitive ops with double, passing reasonable tests now * updating tensor creation calls * fixing the all_reduce contiguous tensor * transposed all reduce results * [syncBN] support fp16 input & fp32 layer for apex fp16 partially fixing launch configs enabling imagenet example to run with --sync_bn * [syncBN PR] Documentation added * adjusting README * adjusting again * added some doc to imagenet example * [syncBN] warp-level reduction bug fix: warp reduction logic updated. check for dummy element to avoid nan. improved launch config for better reduction kernels. Further improvements would be to increase grid size. * [syncBN] fixing undefined behavior in __shfl_down_sync from divergent threads in warp reduction. changing at::native::empty to at::empty (upstream comments)
-