- 11 Mar, 2022 1 commit
-
-
Pruthvi Madugundu authored
-
- 16 Feb, 2022 1 commit
-
-
hubertlu-tw authored
-
- 28 Jan, 2022 1 commit
-
-
Jithun Nair authored
-
- 26 Jan, 2022 1 commit
-
-
Jithun Nair authored
-
- 25 Jan, 2022 2 commits
- 21 Jan, 2022 1 commit
-
-
athitten authored
Removing debug print statement that is not necessary.
-
- 14 Dec, 2021 3 commits
-
-
Jithun Nair authored
IFU-master-2021-12-08
-
Hubert Lu authored
-
Hubert Lu authored
* Skip failing unit tests * Modify the test skipping messages
-
- 13 Dec, 2021 1 commit
-
-
Hubert Lu authored
-
- 09 Dec, 2021 5 commits
-
-
Hubert Lu authored
-
Masaki Kozuki authored
* pass `self.mask_additive` * clang-format * removing THCState
-
Kevin Stephano authored
* Add fused mixed precision lamb optimizer. * Fix device usage in constructor. * Fix sending param_group tensor state to device. * Remove unneeded device set.
-
Hubert Lu authored
-
hubertlu-tw authored
-
- 08 Dec, 2021 1 commit
-
-
Jithun Nair authored
IFU-2021-10-15 (+ remove redundant defines + C10_CUDA_CHECK)
-
- 06 Dec, 2021 2 commits
-
-
Hubert Lu authored
-
Masaki Kozuki authored
Changes include - THC headers removal - TH macros replacement - fix some typo in comment Conflicts: apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu apex/contrib/csrc/multihead_attn/encdec_multihead_attn_cuda.cu apex/contrib/csrc/multihead_attn/encdec_multihead_attn_norm_add_cuda.cu apex/contrib/csrc/multihead_attn/masked_softmax_dropout_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_additive_mask_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_norm_add_cuda.cu apex/contrib/csrc/multihead_attn/strided_batched_gemm.h
-
- 03 Dec, 2021 2 commits
-
-
hubertlu-tw authored
-
hubertlu-tw authored
-
- 02 Dec, 2021 4 commits
-
-
Jithun Nair authored
* Use --cuda_ext flag to build all supported extensions * Don't remove --cuda_ext since it'll be needed to build other extensions * Need to clear all cmdline args so setup.py doesn't complain
-
Hubert Lu authored
Add more unit tests for both distributed and extensions
-
hubertlu-tw authored
-
Hubert Lu authored
-
- 01 Dec, 2021 2 commits
- 29 Nov, 2021 1 commit
-
-
X Wang authored
-
- 22 Nov, 2021 1 commit
-
-
Hubert Lu authored
Change python3.6 to python
-
- 19 Nov, 2021 5 commits
-
-
Hubert Lu authored
-
Hubert Lu authored
-
eqy authored
* minimal bert pipeline parallel test * fix global and cleanup * use get_forward_backward_func * cleanup and fix some tests
-
Masaki Kozuki authored
Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com>
-
Masaki Kozuki authored
* init logging use * fix * clean up * fp32 p2p comm * init * Dynamic global batch size with `MegatronPretrainingSampler` I couldn't make this script work with `MegatronPretrainingRandomSampler` because the random sampler seems to have some requirement for global batch size, total number of samples, local minibatch size, etc. which I'm not familiar with for now * revive original pipeline parallel test * update MULTIGPU_TEST: add dynamic batchsize test * run MegatronPretrainingRandomSampler * fix comment * fix * update * cosmetic * add note * Apply 2 suggestion(s) to 2 file(s) * change following https://github.com/NVIDIA/apex/pull/1210 * fix
-
- 18 Nov, 2021 1 commit
-
-
Abhishree authored
-
- 17 Nov, 2021 2 commits
-
-
X Wang authored
-
Masaki Kozuki authored
-
- 10 Nov, 2021 3 commits
-
-
Masaki Kozuki authored
-
eqy authored
-
eqy authored
-