Commits · 705aa35dfc3ad22909a9f339c34ca6da9d1b65b6 · OpenDAS / apex

01 Apr, 2022 2 commits
- Fix halo correction kernel · 705aa35d
  Thor Johnsen authored Apr 01, 2022
  
  705aa35d
- Add halo correction using new cudnn masking feature · 60000f73
  Thor Johnsen authored Mar 31, 2022
  
  60000f73
31 Mar, 2022 3 commits
- Bug fixes · 9c16d945
  Thor Johnsen authored Mar 31, 2022
  
  9c16d945
- Some fixes to better support native nhwc · 0c20c455
  Thor Johnsen authored Mar 31, 2022
  
  0c20c455
- wgrad2 in parallel stream, optional mode to wait for halo transfer · 34df0f79
  Thor Johnsen authored Mar 31, 2022
  
  34df0f79
30 Mar, 2022 1 commit
- Concatenate out1 with halos for backward · 834b1d01
  Thor Johnsen authored Mar 29, 2022
  
  834b1d01
29 Mar, 2022 2 commits
- Module test improvements, bug fixes · e5d0be82
  Thor Johnsen authored Mar 29, 2022
  
  e5d0be82
- Add some debug prints · d925763a
  Thor Johnsen authored Mar 29, 2022
  
  d925763a
28 Mar, 2022 1 commit
- Bug fix · aff81e54
  Thor Johnsen authored Mar 28, 2022
  
  aff81e54
25 Mar, 2022 4 commits
- Forgot · cd8db094
  Thor Johnsen authored Mar 25, 2022
  
  cd8db094
- Optional inplace halo exchange · b41c68b3
  Thor Johnsen authored Mar 25, 2022
  
  b41c68b3
- Halo exchangers · 778808eb
  Thor Johnsen authored Mar 24, 2022
  
  778808eb
- Add bottleneck block · 3ade5b26
  Thor Johnsen authored Mar 24, 2022
  
  3ade5b26
24 Mar, 2022 3 commits
- Bug fix · b48898fb
  Thor Johnsen authored Mar 24, 2022
  
  b48898fb
- Sample 1d peer memory halo exchanger · e510b003
  Thor Johnsen authored Mar 24, 2022
  
  e510b003
- Add module test for peer memory halo exchanger · a61f0c25
  Thor Johnsen authored Mar 24, 2022
  
  a61f0c25
23 Mar, 2022 2 commits
- Bug fixes · a4eb97fb
  Thor Johnsen authored Mar 23, 2022
  
  a4eb97fb
- Peer memory halo exchange · 40a0e025
  Thor Johnsen authored Mar 22, 2022
  
  40a0e025
18 Mar, 2022 1 commit

Minor `README.md` edit + docs update from @crcrpar (#1334) · feae3851

eqy authored Mar 17, 2022



* update ngc link and dockerhub container tag

* update

* update

* update

* Update README.md
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

feae3851

16 Mar, 2022 1 commit
- [transformer] Warn only when `gradient_accumulation_fusion` is `True` and... · 7950a82d
  Masaki Kozuki authored Mar 15, 2022
```
[transformer] Warn only when `gradient_accumulation_fusion` is `True` and `fused_weight_gradient_mlp_cuda` is missing (#1317)
```
  7950a82d
15 Mar, 2022 4 commits
- Add Template of Bug Report (#1321) · a56e88dc
  Masaki Kozuki authored Mar 15, 2022
```
* initial issue_template -- bug

* Apply suggestions from code review
Co-authored-by: eqy <eqy@cs.washington.edu>
Co-authored-by: eqy <eqy@cs.washington.edu>
```
  a56e88dc
- Update cudnn-frontend submodule (#1327) · 5bed56a7
  Yuanzhe Dong authored Mar 15, 2022
```
* Move forward cudnn-frontend

* update throw_if to adapt cudnn frontend
```
  5bed56a7
- Merge pull request #1329 from NVIDIA/leave_bottleneck_masks_as_bool · 1a43f292
  Thor Johnsen authored Mar 15, 2022
```
Leave bottleneck masks as bool
```
  1a43f292
- Leave bottleneck masks as bool · bd7c1a0f
  Thor Johnsen authored Mar 14, 2022
  
  bd7c1a0f
11 Mar, 2022 1 commit

contrib/fmha: Add option to zero out tensors before math (#1322) · 7e1c22d0

chochowski authored Mar 11, 2022



* extend api to allow forced memory zeroing (empty() does not do it)

* typo fix

* ctx change

* move zeroing flag to ctx

* update test
Co-authored-by: mchochowski <mchochowski@nvidia.com>
Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>

7e1c22d0

08 Mar, 2022 4 commits
- Revert "Deprecate reparameterization module (#1316)" (#1319) · 44c30436
  Masaki Kozuki authored Mar 08, 2022
```
This reverts commit adbe075a.
```
  44c30436
- Revert "officially deprecate and clarify the plan of pyprof removal (#1315)" (#1320) · 79143c31
  Masaki Kozuki authored Mar 08, 2022
```
This reverts commit 74e04667.
```
  79143c31
- Deprecate reparameterization module (#1316) · adbe075a
  Masaki Kozuki authored Mar 08, 2022
  
  adbe075a
- officially deprecate and clarify the plan of pyprof removal (#1315) · 74e04667
  Masaki Kozuki authored Mar 08, 2022
  
  74e04667
01 Mar, 2022 1 commit
- [transformer] Update `build_model` function to support encoder&decoder model (#1307) · 59978d5e
  Masaki Kozuki authored Feb 28, 2022
```
* update build_model to support enc&dec model

* fix typo: cur_sargs -> cur_args

* enc&dec path: correctly update pre/post process
```
  59978d5e
27 Feb, 2022 1 commit
- build fused grad accum w/ wgrad only if cuda>10 (#1312) · 47c269b6
  Masaki Kozuki authored Feb 26, 2022
  
  47c269b6
26 Feb, 2022 1 commit

[transformer] Fuse grad accumulation with wgrad (#1297) · ddc08039

Masaki Kozuki authored Feb 25, 2022



* fuse grad accumulation w/ weight grad
Co-authored-by: Sangkug Lym <slym@nvidia.com>

* fp32 training path

* not using *args, **kwargs

* backward: moved the tensor dimension cnversion
Co-authored-by: Sangkug Lym <slym@nvidia.com>

* move files to csrc/megatron

* fix fp32 path

* fix typo

* add  to  in order to select the correct custom extension

* fix typo

* comment on import guard

* update test: enable gradient_accumulation_fusion

* 86

* remove redundant call of `test_column_parallel_linear`
Co-authored-by: Sangkug Lym <slym@nvidia.com>

ddc08039

25 Feb, 2022 3 commits
- add setter of pipeline model parallel split rank (#1306) · e95c3b9c
  Masaki Kozuki authored Feb 25, 2022
  
  e95c3b9c
- [transformer] use logger in microbatches module (#1302) · 17e1a1f6
  Masaki Kozuki authored Feb 25, 2022
  
  17e1a1f6
- skip FastLayerNorm (#1305) · 4506a687
  Masaki Kozuki authored Feb 24, 2022
  
  4506a687
23 Feb, 2022 4 commits
- be more flexible (#1299) · 199fa834
  Masaki Kozuki authored Feb 23, 2022
  
  199fa834
- access to pipeline_model_parallel_split_rank (#1300) · 069ff336
  Masaki Kozuki authored Feb 23, 2022
  
  069ff336
- Merge pull request #1301 from NVIDIA/bug_fix_in_fast_bottleneck · ab1a93a7
  Thor Johnsen authored Feb 23, 2022
```
Change data type for virtual tensors to float
```
  ab1a93a7
- Change data type for virtual tensors to float · 51e81314
  Thor Johnsen authored Feb 23, 2022
  
  51e81314
15 Feb, 2022 1 commit
- taking channels last 3d into account (#1284) · 39fc7ccf
  Masaki Kozuki authored Feb 15, 2022
  
  39fc7ccf