- 09 Dec, 2021 3 commits
-
-
Masaki Kozuki authored
* pass `self.mask_additive` * clang-format * removing THCState
-
Kevin Stephano authored
* Add fused mixed precision lamb optimizer. * Fix device usage in constructor. * Fix sending param_group tensor state to device. * Remove unneeded device set.
-
hubertlu-tw authored
-
- 08 Dec, 2021 1 commit
-
-
Jithun Nair authored
IFU-2021-10-15 (+ remove redundant defines + C10_CUDA_CHECK)
-
- 06 Dec, 2021 2 commits
-
-
Hubert Lu authored
-
Masaki Kozuki authored
Changes include - THC headers removal - TH macros replacement - fix some typo in comment Conflicts: apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu apex/contrib/csrc/multihead_attn/encdec_multihead_attn_cuda.cu apex/contrib/csrc/multihead_attn/encdec_multihead_attn_norm_add_cuda.cu apex/contrib/csrc/multihead_attn/masked_softmax_dropout_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_additive_mask_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_cuda.cu apex/contrib/csrc/multihead_attn/self_multihead_attn_norm_add_cuda.cu apex/contrib/csrc/multihead_attn/strided_batched_gemm.h
-
- 03 Dec, 2021 2 commits
-
-
hubertlu-tw authored
-
hubertlu-tw authored
-
- 02 Dec, 2021 4 commits
-
-
Jithun Nair authored
* Use --cuda_ext flag to build all supported extensions * Don't remove --cuda_ext since it'll be needed to build other extensions * Need to clear all cmdline args so setup.py doesn't complain
-
Hubert Lu authored
Add more unit tests for both distributed and extensions
-
hubertlu-tw authored
-
Hubert Lu authored
-
- 01 Dec, 2021 2 commits
- 29 Nov, 2021 1 commit
-
-
X Wang authored
-
- 22 Nov, 2021 1 commit
-
-
Hubert Lu authored
Change python3.6 to python
-
- 19 Nov, 2021 5 commits
-
-
Hubert Lu authored
-
Hubert Lu authored
-
eqy authored
* minimal bert pipeline parallel test * fix global and cleanup * use get_forward_backward_func * cleanup and fix some tests
-
Masaki Kozuki authored
Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com>
-
Masaki Kozuki authored
* init logging use * fix * clean up * fp32 p2p comm * init * Dynamic global batch size with `MegatronPretrainingSampler` I couldn't make this script work with `MegatronPretrainingRandomSampler` because the random sampler seems to have some requirement for global batch size, total number of samples, local minibatch size, etc. which I'm not familiar with for now * revive original pipeline parallel test * update MULTIGPU_TEST: add dynamic batchsize test * run MegatronPretrainingRandomSampler * fix comment * fix * update * cosmetic * add note * Apply 2 suggestion(s) to 2 file(s) * change following https://github.com/NVIDIA/apex/pull/1210 * fix
-
- 18 Nov, 2021 1 commit
-
-
Abhishree authored
-
- 17 Nov, 2021 2 commits
-
-
X Wang authored
-
Masaki Kozuki authored
-
- 10 Nov, 2021 3 commits
-
-
Masaki Kozuki authored
-
eqy authored
-
eqy authored
-
- 02 Nov, 2021 3 commits
-
-
Hubert Lu authored
Enable multihead atten
-
Hubert Lu authored
Co-authored-by:Jeff Daily <jeff.daily@amd.com>
-
hubertlu-tw authored
-
- 01 Nov, 2021 3 commits
-
-
hubertlu-tw authored
Fix rocblas_gemmex namespace Fix namespace Clean up comments
-
hubertlu-tw authored
Enable HIP floa to hald conversion
-
hubertlu-tw authored
Fix some spacing
-
- 29 Oct, 2021 2 commits
-
-
Peng authored
-
hubertlu-tw authored
-
- 28 Oct, 2021 1 commit
-
-
hubertlu-tw authored
-
- 27 Oct, 2021 3 commits
-
-
Masaki Kozuki authored
* Persistent LayerNorm: Multi-CTA Rewrite * autocast support Co-authored-by:Young-Jun Ko <youngjun.ko@gmail.com>
-
Masaki Kozuki authored
* Init apex.ppu (pipeline model parallel utility) Reference commit: ``` commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main) Merge: 14f2c684 7b293d9b Author: Mohammad Shoeybi <mshoeybi@nvidia.com> Date: Wed Sep 22 22:57:54 2021 -0700 Merge branch 'add_BOS' into 'main' Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives See merge request ADLR/megatron-lm!328 ``` * removing get_args and replace import - phase 1 * removing get_args and replace import - phase 2 * move ppu to apex.transformer.pipeline_parallel * update two __init__.py * update READMEs * mpu -> parallel_state & tensor_parallel * fix * remove not pipeline files * separate schedules.py - phase 1 * dissect schedules.py * data_iterators -> batch * remove optimizer from forward_backward_step funcs * init test * Apply 2 suggestion(s) to 2 file(s) * fix cyclic import * fix syntax of Callable * fix - 1 * move directory as testing used for pp test as well * add some functions for num microbatches calculator * model is a list in pipeline parallel * skip build num microbatch calculator * fix test * assert -> raise * skip args printing * specify tensor shape everywhere even if None - phase 1 * private timers * passing tensor shape & dtype around * update dtype handling by introducing helper func * write helper func to reduce cyclomatic complexity * remove duplicate * update * move split_tensor_into_1d_equal_chunks to avoid cyclic import * tmp * cosmetic * move gather_split_1d_tensor to avoid cyclic imports * remove debug print * add outer loop * early return if possible * cosmetic * passing around tensor shape * refactor test * add script to learn batch sampler behavior * update * minibatch splitter * add minibatch splitter * split minibatch into microbatches * minor changes * uncomment split batch for test sake * set as attribute * study the behavior of no pipelining * debug 1 * reflect test util namespace change * update readme * cosmetic in test * add model build helper func for interleaving shced * adding model builder from megatron * canbe cyclic import * fix * enable interleaving test, but failing even if forward only * fix batch preparation * add explanation * print data parallel size * fix typo * Add Megatron style GPT model by Rishi Co-authored-by:Rishi Puri <riship@nvidia.com> * update * type hint for jit * fix forward_backward_no_pipelining test * pipeline forward backward seem to hang if not forward only * fix typo * debug * add p2p test * simplify * fix * tentative * set both tmp and pmp to 1 * init * fix typo * fix * fix path of divide * set seed for tmp * update upon Eddie comment * fix typo * adding failing data loader test * fix * megatron still failing * check in * with the nested loop of new order, interleaving seems fine * cosmetic change * make `forward_backward_pipelining_with_interleaving private * warn users that interleaving sched is unstable * move noop handler to no pipelining * comment out rank_print * make `build_model` more flexible * skip megatron test tentatively * correctly comment out rank_print * correctly comment out rank_print * correctly comment out rank_print * skip appropriately * remove wip p2p comm test * update type hint of model_provider_func * disable tf32 in each test script * skip interleaving w/ backward * rename as mpu is the old name * remove broken case * expose build_model func * delete `dist.ring_exchange` func call and `use_ring_exchange` argument * nit fixes * check in * remove unused file * update the list * update tensor shape * remove mixed dtype case * use torch.distributed.run * 2020 -> 2021 * another 2020 -> 2021 * docstring & type hint * fix teardown * update * change to experimental * check if warned Co-authored-by:
Rishi Puri <riship@nvidia.com> Co-authored-by:
Eddie Yan <eddiey@nvidia.com>
-
- 26 Oct, 2021 1 commit
-
-
hubertlu authored
-