1. 26 Feb, 2022 1 commit
  2. 15 Feb, 2022 1 commit
  3. 12 Feb, 2022 1 commit
  4. 04 Feb, 2022 1 commit
  5. 09 Dec, 2021 1 commit
  6. 27 Oct, 2021 1 commit
    • Masaki Kozuki's avatar
      Pipeline Model Parallel (#1202) · 63d5dd63
      Masaki Kozuki authored
      * Init apex.ppu (pipeline model parallel utility)
      
      Reference commit:
      
      ```
      commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
      Merge: 14f2c684 7b293d9b
      Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
      Date:   Wed Sep 22 22:57:54 2021 -0700
      
          Merge branch 'add_BOS' into 'main'
      
          Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
      
          See merge request ADLR/megatron-lm!328
      ```
      
      * removing get_args and replace import - phase 1
      
      * removing get_args and replace import - phase 2
      
      * move ppu to apex.transformer.pipeline_parallel
      
      * update two __init__.py
      
      * update READMEs
      
      * mpu -> parallel_state & tensor_parallel
      
      * fix
      
      * remove not pipeline files
      
      * separate schedules.py - phase 1
      
      * dissect schedules.py
      
      * data_iterators -> batch
      
      * remove optimizer from forward_backward_step funcs
      
      * init test
      
      * Apply 2 suggestion(s...
      63d5dd63
  7. 08 Oct, 2021 1 commit
  8. 07 Oct, 2021 1 commit
  9. 02 Oct, 2021 1 commit
  10. 24 Sep, 2021 1 commit
  11. 04 Sep, 2021 1 commit
    • Burc Eryilmaz's avatar
      fix CUBLAS guards (#1162) · 54b93919
      Burc Eryilmaz authored
      
      
      * support for fused dense layer with cublasLt, fusion in both fprop and bprop
      
      * fix typo causing syntax error
      
      * add fused GEMM+gelu+GEMM modue
      
      * fix typo for workspace size
      
      * update cublas check for 11600
      
      * add tests for fused dense layer
      
      * fix CUDA 10.x path
      
      * safer guard around CUBLAS constants, remove unreferenced variable
      
      * more guard changes
      
      * guard against cublas version instead of cuda
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      54b93919
  12. 01 Sep, 2021 2 commits
  13. 17 May, 2021 1 commit
  14. 19 Apr, 2021 1 commit
  15. 17 Apr, 2021 1 commit
  16. 15 Apr, 2021 1 commit
    • Sudhakar Singh's avatar
      Add unit tests for Fused NovoGrad (#1065) · 59d2f7ac
      Sudhakar Singh authored
      * Add unit tests for fused-novograd
      
      * Fix: tensors should reside on the same device
      
      * Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test
      
      * fixed issues mentioned in the comments
      59d2f7ac
  17. 19 Oct, 2020 1 commit
    • lly-zero-one's avatar
      Optimize the sync batchnorm by batching the communication (#980) · 8a1ed9e8
      lly-zero-one authored
      In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation.
      
      For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path
      We also batch the all_reduce in backward path
      We add the contiguous call on the input of welford_parallel kernel.
      If there is any standard perf benchmark, I would be happy to run it.
      8a1ed9e8
  18. 05 Aug, 2020 1 commit
  19. 06 Jul, 2020 1 commit
    • jjsjann123's avatar
      [sync BN] (#792) · 1ff54b8f
      jjsjann123 authored
      * [sync BN]
      
      support non-uniform batch size across process group.
      
      TODO: test should be added once cleaned up.
      
      * updating unit tests
      
      * new unit tests for different inputs
      
      * cleaning
      1ff54b8f
  20. 23 May, 2020 1 commit
  21. 22 May, 2020 5 commits
  22. 21 May, 2020 1 commit
  23. 14 May, 2020 1 commit
  24. 30 Apr, 2020 3 commits
  25. 28 Apr, 2020 1 commit
  26. 22 Apr, 2020 1 commit
  27. 10 Apr, 2020 1 commit
  28. 27 Feb, 2020 1 commit
  29. 04 Oct, 2019 1 commit
  30. 06 Sep, 2019 1 commit
    • mcarilli's avatar
      Fix for #456 (#477) · 325f5a0b
      mcarilli authored
      * Pushing for build tests
      
      * Contrib files
      
      * Removing deprecated checks
      325f5a0b
  31. 20 Aug, 2019 1 commit
  32. 17 Aug, 2019 1 commit
  33. 16 Aug, 2019 1 commit
    • Deyu Fu's avatar
      clean up variance options support by all fused optimizers: · 18062b69
      Deyu Fu authored
      correctly not apply bias correction to epsilon(same as recent upstream change)
      correctly not apply bias correction to weight decay(consistent with upstream AdamW)
      Make adam_w_mode for FusedAdam/LAMB, to do L2 or Weight Decay (Adam vs AdamW)
      Correct document reg_inside_moment differently from adam_w_mode in FusedNovoGrad
      Removed legacy eps_mode from FusedAdam
      Make internal math type float across fused optimizers
      18062b69