1. 09 Dec, 2021 3 commits
  2. 08 Dec, 2021 1 commit
  3. 06 Dec, 2021 2 commits
    • Hubert Lu's avatar
      Replace THCudaCheck with C10_CUDA_CHECK · fec3141c
      Hubert Lu authored
      fec3141c
    • Masaki Kozuki's avatar
      remove THC headers/functions (#1192) · 2155dabf
      Masaki Kozuki authored
      Changes include
      - THC headers removal
      - TH macros replacement
      - fix some typo in comment
       Conflicts:
      	apex/contrib/csrc/multihead_attn/additive_masked_softmax_dropout_cuda.cu
      	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_cuda.cu
      	apex/contrib/csrc/multihead_attn/encdec_multihead_attn_norm_add_cuda.cu
      	apex/contrib/csrc/multihead_attn/masked_softmax_dropout_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_additive_mask_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_bias_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_cuda.cu
      	apex/contrib/csrc/multihead_attn/self_multihead_attn_norm_add_cuda.cu
      	apex/contrib/csrc/multihead_attn/strided_batched_gemm.h
      2155dabf
  4. 03 Dec, 2021 2 commits
  5. 02 Dec, 2021 4 commits
  6. 01 Dec, 2021 2 commits
  7. 29 Nov, 2021 1 commit
  8. 22 Nov, 2021 1 commit
  9. 19 Nov, 2021 5 commits
  10. 18 Nov, 2021 1 commit
  11. 17 Nov, 2021 2 commits
  12. 10 Nov, 2021 3 commits
  13. 02 Nov, 2021 3 commits
  14. 01 Nov, 2021 3 commits
  15. 29 Oct, 2021 2 commits
  16. 28 Oct, 2021 1 commit
  17. 27 Oct, 2021 3 commits
    • Masaki Kozuki's avatar
      `FastLayerNorm` compat with `autocast` (#1203) · ae757634
      Masaki Kozuki authored
      
      
      * Persistent LayerNorm: Multi-CTA Rewrite
      
      * autocast support
      Co-authored-by: default avatarYoung-Jun Ko <youngjun.ko@gmail.com>
      ae757634
    • Masaki Kozuki's avatar
      Pipeline Model Parallel (#1202) · 63d5dd63
      Masaki Kozuki authored
      
      
      * Init apex.ppu (pipeline model parallel utility)
      
      Reference commit:
      
      ```
      commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
      Merge: 14f2c684 7b293d9b
      Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
      Date:   Wed Sep 22 22:57:54 2021 -0700
      
          Merge branch 'add_BOS' into 'main'
      
          Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
      
          See merge request ADLR/megatron-lm!328
      ```
      
      * removing get_args and replace import - phase 1
      
      * removing get_args and replace import - phase 2
      
      * move ppu to apex.transformer.pipeline_parallel
      
      * update two __init__.py
      
      * update READMEs
      
      * mpu -> parallel_state & tensor_parallel
      
      * fix
      
      * remove not pipeline files
      
      * separate schedules.py - phase 1
      
      * dissect schedules.py
      
      * data_iterators -> batch
      
      * remove optimizer from forward_backward_step funcs
      
      * init test
      
      * Apply 2 suggestion(s) to 2 file(s)
      
      * fix cyclic import
      
      * fix syntax of Callable
      
      * fix - 1
      
      * move directory as testing used for pp test as well
      
      * add some functions for num microbatches calculator
      
      * model is a list in pipeline parallel
      
      * skip build num microbatch calculator
      
      * fix test
      
      * assert -> raise
      
      * skip args printing
      
      * specify tensor shape everywhere even if None - phase 1
      
      * private timers
      
      * passing tensor shape & dtype around
      
      * update dtype handling by introducing helper func
      
      * write helper func to reduce cyclomatic complexity
      
      * remove duplicate
      
      * update
      
      * move split_tensor_into_1d_equal_chunks to avoid cyclic import
      
      * tmp
      
      * cosmetic
      
      * move gather_split_1d_tensor to avoid cyclic imports
      
      * remove debug print
      
      * add outer loop
      
      * early return if possible
      
      * cosmetic
      
      * passing around tensor shape
      
      * refactor test
      
      * add script to learn batch sampler behavior
      
      * update
      
      * minibatch splitter
      
      * add minibatch splitter
      
      * split minibatch into microbatches
      
      * minor changes
      
      * uncomment split batch for test sake
      
      * set as attribute
      
      * study the behavior of no pipelining
      
      * debug 1
      
      * reflect test util namespace change
      
      * update readme
      
      * cosmetic in test
      
      * add model build helper func for interleaving shced
      
      * adding model builder from megatron
      
      * canbe cyclic import
      
      * fix
      
      * enable interleaving test, but failing even if forward only
      
      * fix batch preparation
      
      * add explanation
      
      * print data parallel size
      
      * fix typo
      
      * Add Megatron style GPT model by Rishi
      Co-authored-by: default avatarRishi Puri <riship@nvidia.com>
      
      * update
      
      * type hint for jit
      
      * fix forward_backward_no_pipelining test
      
      * pipeline forward backward seem to hang if not forward only
      
      * fix typo
      
      * debug
      
      * add p2p test
      
      * simplify
      
      * fix
      
      * tentative
      
      * set both tmp and pmp to 1
      
      * init
      
      * fix typo
      
      * fix
      
      * fix path of divide
      
      * set seed for tmp
      
      * update upon Eddie comment
      
      * fix typo
      
      * adding failing data loader test
      
      * fix
      
      * megatron still failing
      
      * check in
      
      * with the nested loop of new order, interleaving seems fine
      
      * cosmetic change
      
      * make `forward_backward_pipelining_with_interleaving private
      
      * warn users that interleaving sched is unstable
      
      * move noop handler to no pipelining
      
      * comment out rank_print
      
      * make `build_model` more flexible
      
      * skip megatron test tentatively
      
      * correctly comment out rank_print
      
      * correctly comment out rank_print
      
      * correctly comment out rank_print
      
      * skip appropriately
      
      * remove wip p2p comm test
      
      * update type hint of model_provider_func
      
      * disable tf32 in each test script
      
      * skip interleaving w/ backward
      
      * rename as mpu is the old name
      
      * remove broken case
      
      * expose build_model func
      
      * delete `dist.ring_exchange` func call and `use_ring_exchange` argument
      
      * nit fixes
      
      * check in
      
      * remove unused file
      
      * update the list
      
      * update tensor shape
      
      * remove mixed dtype case
      
      * use torch.distributed.run
      
      * 2020 -> 2021
      
      * another 2020 -> 2021
      
      * docstring & type hint
      
      * fix teardown
      
      * update
      
      * change to experimental
      
      * check if warned
      Co-authored-by: default avatarRishi Puri <riship@nvidia.com>
      Co-authored-by: default avatarEddie Yan <eddiey@nvidia.com>
      63d5dd63
    • hubertlu's avatar
      Revert "Enable MLP unit tests on ROCm" · aee9f00d
      hubertlu authored
      This reverts commit 964e61f1.
      aee9f00d
  18. 26 Oct, 2021 1 commit