- 01 Apr, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 31 Mar, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 30 Mar, 2022 1 commit
-
-
Thor Johnsen authored
-
- 29 Mar, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 28 Mar, 2022 1 commit
-
-
Thor Johnsen authored
-
- 25 Mar, 2022 4 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 24 Mar, 2022 3 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 23 Mar, 2022 2 commits
-
-
Thor Johnsen authored
-
Thor Johnsen authored
-
- 18 Mar, 2022 1 commit
-
-
eqy authored
* update ngc link and dockerhub container tag * update * update * update * Update README.md Co-authored-by:Masaki Kozuki <mkozuki@nvidia.com>
-
- 16 Mar, 2022 1 commit
-
-
Masaki Kozuki authored
[transformer] Warn only when `gradient_accumulation_fusion` is `True` and `fused_weight_gradient_mlp_cuda` is missing (#1317)
-
- 15 Mar, 2022 4 commits
-
-
Masaki Kozuki authored
* initial issue_template -- bug * Apply suggestions from code review Co-authored-by:
eqy <eqy@cs.washington.edu> Co-authored-by:
eqy <eqy@cs.washington.edu>
-
Yuanzhe Dong authored
* Move forward cudnn-frontend * update throw_if to adapt cudnn frontend
-
Thor Johnsen authored
Leave bottleneck masks as bool
-
Thor Johnsen authored
-
- 11 Mar, 2022 1 commit
-
-
chochowski authored
* extend api to allow forced memory zeroing (empty() does not do it) * typo fix * ctx change * move zeroing flag to ctx * update test Co-authored-by:
mchochowski <mchochowski@nvidia.com> Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com>
-
- 08 Mar, 2022 4 commits
-
-
Masaki Kozuki authored
This reverts commit adbe075a.
-
Masaki Kozuki authored
This reverts commit 74e04667.
-
Masaki Kozuki authored
-
Masaki Kozuki authored
-
- 01 Mar, 2022 1 commit
-
-
Masaki Kozuki authored
* update build_model to support enc&dec model * fix typo: cur_sargs -> cur_args * enc&dec path: correctly update pre/post process
-
- 27 Feb, 2022 1 commit
-
-
Masaki Kozuki authored
-
- 26 Feb, 2022 1 commit
-
-
Masaki Kozuki authored
* fuse grad accumulation w/ weight grad Co-authored-by:
Sangkug Lym <slym@nvidia.com> * fp32 training path * not using *args, **kwargs * backward: moved the tensor dimension cnversion Co-authored-by:
Sangkug Lym <slym@nvidia.com> * move files to csrc/megatron * fix fp32 path * fix typo * add to in order to select the correct custom extension * fix typo * comment on import guard * update test: enable gradient_accumulation_fusion * 86 * remove redundant call of `test_column_parallel_linear` Co-authored-by:
Sangkug Lym <slym@nvidia.com>
-
- 25 Feb, 2022 3 commits
-
-
Masaki Kozuki authored
-
Masaki Kozuki authored
-
Masaki Kozuki authored
-
- 23 Feb, 2022 4 commits
-
-
Masaki Kozuki authored
-
Masaki Kozuki authored
-
Thor Johnsen authored
Change data type for virtual tensors to float
-
Thor Johnsen authored
-
- 15 Feb, 2022 1 commit
-
-
Masaki Kozuki authored
-