1. 27 Oct, 2021 1 commit
    • Masaki Kozuki's avatar
      Pipeline Model Parallel (#1202) · 63d5dd63
      Masaki Kozuki authored
      * Init apex.ppu (pipeline model parallel utility)
      
      Reference commit:
      
      ```
      commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
      Merge: 14f2c684 7b293d9b
      Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
      Date:   Wed Sep 22 22:57:54 2021 -0700
      
          Merge branch 'add_BOS' into 'main'
      
          Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
      
          See merge request ADLR/megatron-lm!328
      ```
      
      * removing get_args and replace import - phase 1
      
      * removing get_args and replace import - phase 2
      
      * move ppu to apex.transformer.pipeline_parallel
      
      * update two __init__.py
      
      * update READMEs
      
      * mpu -> parallel_state & tensor_parallel
      
      * fix
      
      * remove not pipeline files
      
      * separate schedules.py - phase 1
      
      * dissect schedules.py
      
      * data_iterators -> batch
      
      * remove optimizer from forward_backward_step funcs
      
      * init test
      
      * Apply 2 suggestion(s...
      63d5dd63
  2. 23 Oct, 2021 1 commit
  3. 18 Oct, 2021 1 commit
  4. 16 Oct, 2021 1 commit
  5. 14 Oct, 2021 2 commits
  6. 13 Oct, 2021 1 commit
  7. 08 Oct, 2021 2 commits
  8. 07 Oct, 2021 1 commit
  9. 06 Oct, 2021 1 commit
  10. 02 Oct, 2021 1 commit
  11. 30 Sep, 2021 1 commit
  12. 28 Sep, 2021 1 commit
  13. 24 Sep, 2021 2 commits
  14. 08 Sep, 2021 1 commit
    • Masaki Kozuki's avatar
      enable ninja (#1164) · 9ce0a10f
      Masaki Kozuki authored
      - passing include directories to `CUDAExtension`'s `include_dirs` argument
      - removing `-I/path/to/dir` arguments from `extra_compile_args`
      9ce0a10f
  15. 04 Sep, 2021 1 commit
    • Burc Eryilmaz's avatar
      fix CUBLAS guards (#1162) · 54b93919
      Burc Eryilmaz authored
      
      
      * support for fused dense layer with cublasLt, fusion in both fprop and bprop
      
      * fix typo causing syntax error
      
      * add fused GEMM+gelu+GEMM modue
      
      * fix typo for workspace size
      
      * update cublas check for 11600
      
      * add tests for fused dense layer
      
      * fix CUDA 10.x path
      
      * safer guard around CUBLAS constants, remove unreferenced variable
      
      * more guard changes
      
      * guard against cublas version instead of cuda
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      54b93919
  16. 02 Sep, 2021 13 commits
  17. 01 Sep, 2021 5 commits
  18. 31 Aug, 2021 3 commits
  19. 30 Aug, 2021 1 commit