1. 01 Feb, 2022 1 commit
  2. 31 Jan, 2022 2 commits
  3. 29 Jan, 2022 1 commit
  4. 28 Jan, 2022 2 commits
    • Masaki Kozuki's avatar
      small changes in test and logger format (#1278) · b1c75f6f
      Masaki Kozuki authored
      * cosmetic refactor in test
      
      * log with PID
      
      * log more info: rank, pid, filename, lineNo
      b1c75f6f
    • Masaki Kozuki's avatar
      allow for `None` batch (#1280) · a960fe8c
      Masaki Kozuki authored
      * have get_kth_microbatch deal with None batch
      
      * broadcast based on tensor parallel rank
      
      * dtype
      
      * remove unnecessary .cuda()
      
      Processes of tensor parallel rank != 0 doesn't need to prepare one or more `torch.utils.data.DataLoader` instances, which means the argument of `batch` of `get_kth_microbatch` function can be `None` but the current function implementation doesn't allow for it.
      a960fe8c
  5. 21 Jan, 2022 2 commits
  6. 19 Jan, 2022 1 commit
  7. 13 Jan, 2022 1 commit
  8. 17 Dec, 2021 1 commit
    • Masaki Kozuki's avatar
      Add an argument of `dtype` to forward_backward functions to specify the dtype... · b88c507e
      Masaki Kozuki authored
      Add an argument of `dtype` to forward_backward functions to specify the dtype used in p2p comm (#1249)
      
      * let users sepcify dtype for p2p comm taking the possibility of O2 style AMP into account
      
      * add `dtype` argument to forward_backward functions
      
      * fix
      
      * better message
      
      * add docstring of dtype
      
      * add a link to dtype logic of p2p comm
      b88c507e
  9. 16 Dec, 2021 2 commits
  10. 15 Dec, 2021 2 commits
  11. 14 Dec, 2021 2 commits
    • Masaki Kozuki's avatar
      Faster `--fast_multihead_attn` build (#1245) · 7ec8ed67
      Masaki Kozuki authored
      * merge .so files
      
      * odr
      
      * fix build
      
      * update import
      
      * apply psf/black with max line length of 120
      
      * update
      
      * fix
      
      * update
      
      * build fixed again but undefined symbol again
      
      * fix 2, still layer norm grad is undefined
      
      * remove unused cpp files
      
      * without layer_norm.cuh, import works
      
      * import fast_multihead_attn works...
      
      but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
      causing .shared objects not to be able to link `HostApplyLayerNorm` and
      `HostLayerNormGradient`?
      
      * clean up layer norm
      7ec8ed67
    • eqy's avatar
      check size in kth microbatch (#1247) · ed94d0bb
      eqy authored
      ed94d0bb
  12. 10 Dec, 2021 2 commits
    • Masaki Kozuki's avatar
      Cherry-pick Megatron-LM's changes in pipeline model parallel for T5 (#1232) · 0e25fcc4
      Masaki Kozuki authored
      * update parallel_state
      
      * update pipeline common funcs - forward_step and backward_step
      
      * update pipelining w/o interleaving
      
      * type hint
      
      * merge utils into without_interleaving
      
      Motivation: functions in utils are only used by
      forward_backward_pipelining_without_interleaving
      
      * fix handling of `model_type`
      
      * fix import of DDP
      
      * update set_input_tensor method
      
      * fix
      
      * cosmetic
      
      * update model
      
      * refactor pipeline test scripts
      0e25fcc4
    • Rishi Puri's avatar
      Minimal gpt pipeline parallel (builds off of minimal_bert_pipeline_parallel)... · ab7af058
      Rishi Puri authored
      
      Minimal gpt pipeline parallel (builds off of minimal_bert_pipeline_parallel) including cpu-offloading (#1222)
      
      * minimal bert pipeline parallel test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * adding gpt_minimal_test to list of multigpu tests
      Co-authored-by: default avatarEddie Yan <eddiey@nvidia.com>
      Co-authored-by: default avatarriship <riship@nvidia.com>
      ab7af058
  13. 09 Dec, 2021 2 commits
  14. 19 Nov, 2021 3 commits
  15. 10 Nov, 2021 3 commits
  16. 27 Oct, 2021 2 commits
    • Masaki Kozuki's avatar
      `FastLayerNorm` compat with `autocast` (#1203) · ae757634
      Masaki Kozuki authored
      
      
      * Persistent LayerNorm: Multi-CTA Rewrite
      
      * autocast support
      Co-authored-by: default avatarYoung-Jun Ko <youngjun.ko@gmail.com>
      ae757634
    • Masaki Kozuki's avatar
      Pipeline Model Parallel (#1202) · 63d5dd63
      Masaki Kozuki authored
      * Init apex.ppu (pipeline model parallel utility)
      
      Reference commit:
      
      ```
      commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
      Merge: 14f2c684 7b293d9b
      Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
      Date:   Wed Sep 22 22:57:54 2021 -0700
      
          Merge branch 'add_BOS' into 'main'
      
          Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
      
          See merge request ADLR/megatron-lm!328
      ```
      
      * removing get_args and replace import - phase 1
      
      * removing get_args and replace import - phase 2
      
      * move ppu to apex.transformer.pipeline_parallel
      
      * update two __init__.py
      
      * update READMEs
      
      * mpu -> parallel_state & tensor_parallel
      
      * fix
      
      * remove not pipeline files
      
      * separate schedules.py - phase 1
      
      * dissect schedules.py
      
      * data_iterators -> batch
      
      * remove optimizer from forward_backward_step funcs
      
      * init test
      
      * Apply 2 suggestion(s...
      63d5dd63
  17. 23 Oct, 2021 1 commit
  18. 18 Oct, 2021 1 commit
  19. 16 Oct, 2021 1 commit
  20. 14 Oct, 2021 2 commits
  21. 13 Oct, 2021 1 commit
  22. 08 Oct, 2021 2 commits
  23. 07 Oct, 2021 1 commit
  24. 06 Oct, 2021 1 commit
  25. 02 Oct, 2021 1 commit