Commits · f8b650c8cae793d601f9b07cf62e6983c849dd61 · OpenDAS / apex

12 Jun, 2023 1 commit

flyingdown authored Jun 06, 2023

2.添加环境变量APEX_ROCBLAS_GEMM_ALLOW_HALF用于控制是否使用fp16r
3.添加dcu版本信息

whl包名修改

readme更新安装步骤

f8b650c8

07 Jul, 2022 1 commit
- Remove `pyprof` and `reparameterization` (#1404) · 8a7a3325
  Masaki Kozuki authored Jul 06, 2022
```
* remove pyprof

* remove reparameterization

* remove pyprof test

* clean up
```
  8a7a3325
18 Mar, 2022 1 commit

Minor `README.md` edit + docs update from @crcrpar (#1334) · feae3851

eqy authored Mar 17, 2022



* update ngc link and dockerhub container tag

* update

* update

* update

* Update README.md
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

feae3851

08 Mar, 2022 2 commits
- Revert "officially deprecate and clarify the plan of pyprof removal (#1315)" (#1320) · 79143c31
  Masaki Kozuki authored Mar 08, 2022
```
This reverts commit 74e04667.
```
  79143c31
- officially deprecate and clarify the plan of pyprof removal (#1315) · 74e04667
  Masaki Kozuki authored Mar 08, 2022
  
  74e04667
09 Dec, 2021 1 commit
- Update README.md · 692e1956
  Hubert Lu authored Dec 08, 2021
  
  692e1956
29 Oct, 2021 1 commit
- Update README.md · 325246e4
  Peng authored Oct 29, 2021
  
  325246e4
10 Feb, 2021 1 commit

fix import container_abcs issue (#1049) · a78ccf0b

Shoufa Chen authored Feb 10, 2021

* copy-paste friendly

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.  https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* fix import container_abcs issue

Nightly PyTorch has removed `container_abcs` from `torch._six`.
https://github.com/pytorch/pytorch/commit/58eb23378f2a376565a66ac32c93a316c45b6131#diff-b3c160475f0fbe8ad50310f92d3534172ba98203387a962b7dc8f4a23b15cf4dL35

* keep existing for pytorch1.7 and earlier

a78ccf0b

16 Dec, 2020 1 commit
- update readme and minor changes · 3fdb8db9
  lcskrishna authored Dec 16, 2020
  
  3fdb8db9
15 Dec, 2020 1 commit
- update readme and add a note about deprecating old hipification process · 3b917de4
  lcskrishna authored Dec 14, 2020
  
  3b917de4
04 Dec, 2020 1 commit
- remove noise pip-version-check noise that hides the outcome of the build (#998) · 8cf5ae61
  Stas Bekman authored Dec 04, 2020
  
  8cf5ae61
21 Aug, 2020 1 commit
- update readme with ninja build instruction and pip3.6 install (#35) · e9c43d67
  Chaitanya Sri Krishna Lolla authored Aug 21, 2020
  
  e9c43d67
01 Jun, 2020 1 commit
- Add Pyprof removal warnings that point to new repo (#862) · 097238f8
  mcarilli authored Jun 01, 2020
```
Co-authored-by: Michael Carilli <mcarilli@nvidia.com>
```
  097238f8
29 May, 2020 2 commits
- Update ReadMe · 8fff447e
  Chaitanya Sri Krishna Lolla authored May 29, 2020
  
  8fff447e
- update readme · 979551d8
  lcskrishna authored May 29, 2020
  
  979551d8
10 Oct, 2019 1 commit
- Adding links to references. (maybe make this a subrepo?) · ec93c75b
  mcarilli authored Oct 10, 2019
  
  ec93c75b
08 Oct, 2019 1 commit
- Include loss scaling in README code example (#523) · e87b5799
  Jan Schlüter authored Oct 08, 2019
  
  e87b5799
27 Aug, 2019 1 commit

Enable Checkpointing (#420) · dec4fdd6

ptrblck authored Aug 27, 2019

* add state_dict, load_state_dict

* add test_restoring, test_loss_scale_decrease

* disable amp outputs for checkpoint tests

* add test for amp.state_dict, cleanup

* add state_dict patch, add test

* fixed testing, cleanup

* add readme for checkpointing

* add docs to source/amp

* add review changes to doc

dec4fdd6

13 Aug, 2019 1 commit

Adding PyProf to Apex (#404) · 880ab925

Marek Kolodziej authored Aug 13, 2019


Co-authored-by: Aditya Agrawal <aditya.iitb@gmail.com>
Co-authored-by: Marek Kolodziej <mkolod@gmail.com>

880ab925

24 Jun, 2019 1 commit
- Update README.md · f17cd953
  mcarilli authored Jun 24, 2019
  
  f17cd953
09 May, 2019 1 commit
- Fix link to distributed samples (#298) · 4ff153cd
  Tim Zaman authored May 09, 2019
  
  4ff153cd
30 Apr, 2019 1 commit
- Remove deprecated examples and update Docker guidance · d2ac4872
  Michael Carilli authored Apr 30, 2019
  
  d2ac4872
18 Apr, 2019 1 commit
- Update README.md (#261) · cd2708cc
  Glenn Jocher authored Apr 18, 2019
  
  cd2708cc
11 Apr, 2019 1 commit
- Patching in changes to enable multiple allreduces in flight · 8521bb22
  Michael Carilli authored Apr 11, 2019
  
  8521bb22
12 Mar, 2019 1 commit
- Update README.md · e72283ad
  mcarilli authored Mar 11, 2019
  
  e72283ad
07 Mar, 2019 2 commits
- Support for custom batch types · 589328ff
  Michael Carilli authored Mar 07, 2019
  
  589328ff
- Rearranging documentation · 4606df98
  Michael Carilli authored Mar 07, 2019
  
  4606df98
04 Mar, 2019 1 commit
- Cleaning up READMEs · df83b67e
  Michael Carilli authored Mar 04, 2019
  
  df83b67e
01 Mar, 2019 2 commits
- Update README.md · 603e17a5
  mcarilli authored Mar 01, 2019
  
  603e17a5
- Update README.md · 2c6e6490
  mcarilli authored Feb 28, 2019
  
  2c6e6490
28 Feb, 2019 1 commit
- typo · 519ff816
  vfdev authored Feb 28, 2019
  
  519ff816
20 Feb, 2019 5 commits
- Update README.md · 187ed33e
  mcarilli authored Feb 20, 2019
  
  187ed33e
- Update README.md · 5c78a50f
  mcarilli authored Feb 20, 2019
  
  5c78a50f
- Update README.md · 8542db28
  mcarilli authored Feb 20, 2019
  
  8542db28
- Update README.md · 0ff493d3
  mcarilli authored Feb 20, 2019
  
  0ff493d3
- Update README.md · 6212302e
  mcarilli authored Feb 20, 2019
  
  6212302e
28 Jan, 2019 1 commit
- Update README.md · 95fe7f6a
  mcarilli authored Jan 28, 2019
  
  95fe7f6a
31 Oct, 2018 1 commit

[WIP] Fused layer norm cuda (#69) · 1b9b65ca

Thor Johnsen authored Oct 31, 2018

* Pre-release of fused layer norm apex extension

* Remove half and __half2 specializations

* Code changes from review

1b9b65ca

30 Oct, 2018 1 commit
- Updating documentation for merged utilities · 8124fba2
  Michael Carilli authored Oct 30, 2018
  
  8124fba2
23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef