1. 25 Mar, 2022 2 commits
    • Masaki Kozuki's avatar
      [transformer] Format & Test Refactoring (#1325) · a0ed4151
      Masaki Kozuki authored
      * try PyTorch custom TestCase class
      
      * revert
      
      * initial working example
      
      * update
      
      * data utils
      
      * fix imports
      
      * hardcode backend to nccl
      
      * fix signature
      
      * fix typo
      
      * mapping
      
      * set device
      
      * init
      
      * refactor x entropy
      
      * remove unused import & destroy model parallel
      
      * refactor random
      
      * fix test
      
      * remove migrated tests
      
      * refactor
      
      * init
      
      * separate affine weight init
      
      * init model parallel
      
      * split more
      
      * weight init fix part 1
      
      * use cpu init for consistency btwn native and tensor parallel
      
      * black
      
      * add col parallel
      
      * use a 3D tensor of square matrix for column parallel linear
      
      * skip the failing cases
      
      * migrate layers test
      
      * pipeline parallel forward/backward
      
      * fix typo
      
      * fix typo
      
      * fix
      
      * fix pipeline world size
      
      * black
      
      * rm `run_pipeline_parallel_test` in favor of test_pipeline_parallel_fwd_bwd.py
      
      * stop logging
      
      * set log level
      
      * black
      
      * license and format
      
      * fix
      
      * skip tf32 as matrices are small
      
      * remove potentially inappropriate license
      
      * Apply suggestions from code review
      
      * remove `TODO` comment
      
      * `torch.testing.assert_allclose` -> `torch.testing.assert_close`
      
      * remove comment-outs
      
      * remote unused import
      
      * minor fix
      a0ed4151
    • Masaki Kozuki's avatar
      [transformer] `parallel_state`: Position Embedding (#1343) · f10b4b89
      Masaki Kozuki authored
      * update
      
      * Add comment to `destroy_model_parallel`
      f10b4b89
  2. 24 Mar, 2022 1 commit
    • Masaki Kozuki's avatar
      Add CUDA Focal Loss Implementation (#1337) · 28f8539c
      Masaki Kozuki authored
      
      
      Take-over of #1097
      
      * Add fast CUDA focal loss implementation.
      
      * Enable fast math for CUDA focal loss.
      
      * Correct typo.
      
      * replace deprecated macros
      
      * Add fast CUDA focal loss implementation.
      
      * Enable fast math for CUDA focal loss.
      
      * Correct typo.
      
      * replace deprecated macros
      
      * TORCH_CUDA_CHECK -> AT_CUDA_CHECK
      
      The former is defined in torch/csrc/profiler/cuda.cpp so it's not available usually.
      The latter however is defined in ATen/cuda/Exceptions.h as an alias of C10_CUDA_CHECK.
      
      * add test
      
      * clean up
      
      * guard for torchvision
      Co-authored-by: default avatarWil Kong <alpha0422@gmail.com>
      28f8539c
  3. 18 Mar, 2022 1 commit
  4. 16 Mar, 2022 1 commit
  5. 15 Mar, 2022 4 commits
  6. 11 Mar, 2022 1 commit
  7. 08 Mar, 2022 4 commits
  8. 01 Mar, 2022 1 commit
  9. 27 Feb, 2022 1 commit
  10. 26 Feb, 2022 1 commit
  11. 25 Feb, 2022 3 commits
  12. 23 Feb, 2022 4 commits
  13. 15 Feb, 2022 1 commit
  14. 12 Feb, 2022 1 commit
  15. 11 Feb, 2022 1 commit
  16. 10 Feb, 2022 1 commit
  17. 07 Feb, 2022 1 commit
  18. 04 Feb, 2022 1 commit
  19. 01 Feb, 2022 2 commits
    • ChongyuNVIDIA's avatar
      Add the permutation related support as the extension for asp lib. (#1194) · 89edb819
      ChongyuNVIDIA authored
      * Add the permutation related support as the extension for asp lib.
      
      * [Fix] Track the permutation sequence for progressive channel swap strategy.
      
      * Fix the corner case that one layer is not sparse, but need to apply permutation due to its siblings.
      
      * Fix the deprecated functions in ASP unit tests.
      
      * Fix the sparsity info typo in ASP lib.
      
      * [Enhancement] Set the identical random seed for all GPUs to make sure the same results generated in permutation search.
      
      * Update the README.md with identical random seed setting and NeurIPS info.
      
      * Integrate the Pybind11 enhancement of permutation search into ASP lib.
      89edb819
    • Masaki Kozuki's avatar
      transformer: Allows for custom sync context in no pipelining forward backward function (#1281) · 79c01877
      Masaki Kozuki authored
      * add kwarg of `custom_sync_context_handler`
      
      * add kwargs to ignore custom_sync_context_handler which mistakenly passed to fwd/bwd funcs
      79c01877
  20. 31 Jan, 2022 2 commits
  21. 29 Jan, 2022 1 commit
  22. 28 Jan, 2022 2 commits
    • Masaki Kozuki's avatar
      small changes in test and logger format (#1278) · b1c75f6f
      Masaki Kozuki authored
      * cosmetic refactor in test
      
      * log with PID
      
      * log more info: rank, pid, filename, lineNo
      b1c75f6f
    • Masaki Kozuki's avatar
      allow for `None` batch (#1280) · a960fe8c
      Masaki Kozuki authored
      * have get_kth_microbatch deal with None batch
      
      * broadcast based on tensor parallel rank
      
      * dtype
      
      * remove unnecessary .cuda()
      
      Processes of tensor parallel rank != 0 doesn't need to prepare one or more `torch.utils.data.DataLoader` instances, which means the argument of `batch` of `get_kth_microbatch` function can be `None` but the current function implementation doesn't allow for it.
      a960fe8c
  23. 21 Jan, 2022 2 commits
  24. 19 Jan, 2022 1 commit