1. 25 Feb, 2022 1 commit
  2. 23 Feb, 2022 1 commit
  3. 04 Feb, 2022 1 commit
  4. 31 Jan, 2022 1 commit
  5. 28 Jan, 2022 2 commits
    • Masaki Kozuki's avatar
      small changes in test and logger format (#1278) · b1c75f6f
      Masaki Kozuki authored
      * cosmetic refactor in test
      
      * log with PID
      
      * log more info: rank, pid, filename, lineNo
      b1c75f6f
    • Masaki Kozuki's avatar
      allow for `None` batch (#1280) · a960fe8c
      Masaki Kozuki authored
      * have get_kth_microbatch deal with None batch
      
      * broadcast based on tensor parallel rank
      
      * dtype
      
      * remove unnecessary .cuda()
      
      Processes of tensor parallel rank != 0 doesn't need to prepare one or more `torch.utils.data.DataLoader` instances, which means the argument of `batch` of `get_kth_microbatch` function can be `None` but the current function implementation doesn't allow for it.
      a960fe8c
  6. 21 Jan, 2022 1 commit
  7. 17 Dec, 2021 1 commit
    • Masaki Kozuki's avatar
      Add an argument of `dtype` to forward_backward functions to specify the dtype... · b88c507e
      Masaki Kozuki authored
      Add an argument of `dtype` to forward_backward functions to specify the dtype used in p2p comm (#1249)
      
      * let users sepcify dtype for p2p comm taking the possibility of O2 style AMP into account
      
      * add `dtype` argument to forward_backward functions
      
      * fix
      
      * better message
      
      * add docstring of dtype
      
      * add a link to dtype logic of p2p comm
      b88c507e
  8. 16 Dec, 2021 1 commit
  9. 14 Dec, 2021 1 commit
  10. 10 Dec, 2021 2 commits
    • Masaki Kozuki's avatar
      Cherry-pick Megatron-LM's changes in pipeline model parallel for T5 (#1232) · 0e25fcc4
      Masaki Kozuki authored
      * update parallel_state
      
      * update pipeline common funcs - forward_step and backward_step
      
      * update pipelining w/o interleaving
      
      * type hint
      
      * merge utils into without_interleaving
      
      Motivation: functions in utils are only used by
      forward_backward_pipelining_without_interleaving
      
      * fix handling of `model_type`
      
      * fix import of DDP
      
      * update set_input_tensor method
      
      * fix
      
      * cosmetic
      
      * update model
      
      * refactor pipeline test scripts
      0e25fcc4
    • Rishi Puri's avatar
      Minimal gpt pipeline parallel (builds off of minimal_bert_pipeline_parallel)... · ab7af058
      Rishi Puri authored
      
      Minimal gpt pipeline parallel (builds off of minimal_bert_pipeline_parallel) including cpu-offloading (#1222)
      
      * minimal bert pipeline parallel test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * first draft of gpt minimal test
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * framework to scale up the gpt2 test for variety of distributed setups
      
      * adding gpt_minimal_test to list of multigpu tests
      Co-authored-by: default avatarEddie Yan <eddiey@nvidia.com>
      Co-authored-by: default avatarriship <riship@nvidia.com>
      ab7af058
  11. 09 Dec, 2021 1 commit
  12. 19 Nov, 2021 2 commits
    • eqy's avatar
      minimal bert pipeline parallel test (#1216) · aa756cec
      eqy authored
      * minimal bert pipeline parallel test
      
      * fix global and cleanup
      
      * use get_forward_backward_func
      
      * cleanup and fix some tests
      aa756cec
    • Masaki Kozuki's avatar
      [POC] Support Megatron-LM's `rampup_batch_size` argument (#1212) · 35336133
      Masaki Kozuki authored
      * init logging use
      
      * fix
      
      * clean up
      
      * fp32 p2p comm
      
      * init
      
      * Dynamic global batch size with `MegatronPretrainingSampler`
      
      I couldn't make this script work with `MegatronPretrainingRandomSampler` because the random sampler seems to have some requirement for
      global batch size, total number of samples, local minibatch size, etc. which I'm not familiar with for now
      
      * revive original pipeline parallel test
      
      * update MULTIGPU_TEST: add dynamic batchsize test
      
      * run MegatronPretrainingRandomSampler
      
      * fix comment
      
      * fix
      
      * update
      
      * cosmetic
      
      * add note
      
      * Apply 2 suggestion(s) to 2 file(s)
      
      * change following https://github.com/NVIDIA/apex/pull/1210
      
      * fix
      35336133
  13. 10 Nov, 2021 1 commit
  14. 27 Oct, 2021 1 commit
    • Masaki Kozuki's avatar
      Pipeline Model Parallel (#1202) · 63d5dd63
      Masaki Kozuki authored
      
      
      * Init apex.ppu (pipeline model parallel utility)
      
      Reference commit:
      
      ```
      commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
      Merge: 14f2c684 7b293d9b
      Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
      Date:   Wed Sep 22 22:57:54 2021 -0700
      
          Merge branch 'add_BOS' into 'main'
      
          Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
      
          See merge request ADLR/megatron-lm!328
      ```
      
      * removing get_args and replace import - phase 1
      
      * removing get_args and replace import - phase 2
      
      * move ppu to apex.transformer.pipeline_parallel
      
      * update two __init__.py
      
      * update READMEs
      
      * mpu -> parallel_state & tensor_parallel
      
      * fix
      
      * remove not pipeline files
      
      * separate schedules.py - phase 1
      
      * dissect schedules.py
      
      * data_iterators -> batch
      
      * remove optimizer from forward_backward_step funcs
      
      * init test
      
      * Apply 2 suggestion(s) to 2 file(s)
      
      * fix cyclic import
      
      * fix syntax of Callable
      
      * fix - 1
      
      * move directory as testing used for pp test as well
      
      * add some functions for num microbatches calculator
      
      * model is a list in pipeline parallel
      
      * skip build num microbatch calculator
      
      * fix test
      
      * assert -> raise
      
      * skip args printing
      
      * specify tensor shape everywhere even if None - phase 1
      
      * private timers
      
      * passing tensor shape & dtype around
      
      * update dtype handling by introducing helper func
      
      * write helper func to reduce cyclomatic complexity
      
      * remove duplicate
      
      * update
      
      * move split_tensor_into_1d_equal_chunks to avoid cyclic import
      
      * tmp
      
      * cosmetic
      
      * move gather_split_1d_tensor to avoid cyclic imports
      
      * remove debug print
      
      * add outer loop
      
      * early return if possible
      
      * cosmetic
      
      * passing around tensor shape
      
      * refactor test
      
      * add script to learn batch sampler behavior
      
      * update
      
      * minibatch splitter
      
      * add minibatch splitter
      
      * split minibatch into microbatches
      
      * minor changes
      
      * uncomment split batch for test sake
      
      * set as attribute
      
      * study the behavior of no pipelining
      
      * debug 1
      
      * reflect test util namespace change
      
      * update readme
      
      * cosmetic in test
      
      * add model build helper func for interleaving shced
      
      * adding model builder from megatron
      
      * canbe cyclic import
      
      * fix
      
      * enable interleaving test, but failing even if forward only
      
      * fix batch preparation
      
      * add explanation
      
      * print data parallel size
      
      * fix typo
      
      * Add Megatron style GPT model by Rishi
      Co-authored-by: default avatarRishi Puri <riship@nvidia.com>
      
      * update
      
      * type hint for jit
      
      * fix forward_backward_no_pipelining test
      
      * pipeline forward backward seem to hang if not forward only
      
      * fix typo
      
      * debug
      
      * add p2p test
      
      * simplify
      
      * fix
      
      * tentative
      
      * set both tmp and pmp to 1
      
      * init
      
      * fix typo
      
      * fix
      
      * fix path of divide
      
      * set seed for tmp
      
      * update upon Eddie comment
      
      * fix typo
      
      * adding failing data loader test
      
      * fix
      
      * megatron still failing
      
      * check in
      
      * with the nested loop of new order, interleaving seems fine
      
      * cosmetic change
      
      * make `forward_backward_pipelining_with_interleaving private
      
      * warn users that interleaving sched is unstable
      
      * move noop handler to no pipelining
      
      * comment out rank_print
      
      * make `build_model` more flexible
      
      * skip megatron test tentatively
      
      * correctly comment out rank_print
      
      * correctly comment out rank_print
      
      * correctly comment out rank_print
      
      * skip appropriately
      
      * remove wip p2p comm test
      
      * update type hint of model_provider_func
      
      * disable tf32 in each test script
      
      * skip interleaving w/ backward
      
      * rename as mpu is the old name
      
      * remove broken case
      
      * expose build_model func
      
      * delete `dist.ring_exchange` func call and `use_ring_exchange` argument
      
      * nit fixes
      
      * check in
      
      * remove unused file
      
      * update the list
      
      * update tensor shape
      
      * remove mixed dtype case
      
      * use torch.distributed.run
      
      * 2020 -> 2021
      
      * another 2020 -> 2021
      
      * docstring & type hint
      
      * fix teardown
      
      * update
      
      * change to experimental
      
      * check if warned
      Co-authored-by: default avatarRishi Puri <riship@nvidia.com>
      Co-authored-by: default avatarEddie Yan <eddiey@nvidia.com>
      63d5dd63
  15. 23 Oct, 2021 1 commit
  16. 08 Oct, 2021 1 commit
  17. 06 Oct, 2021 1 commit
  18. 02 Oct, 2021 1 commit
  19. 15 Apr, 2021 1 commit
    • Sudhakar Singh's avatar
      Add unit tests for Fused NovoGrad (#1065) · 59d2f7ac
      Sudhakar Singh authored
      * Add unit tests for fused-novograd
      
      * Fix: tensors should reside on the same device
      
      * Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test
      
      * fixed issues mentioned in the comments
      59d2f7ac
  20. 01 Dec, 2020 1 commit
  21. 05 Aug, 2020 1 commit
  22. 23 Jun, 2020 3 commits
  23. 14 May, 2020 1 commit
  24. 30 Apr, 2020 1 commit
    • Deyu Fu's avatar
      Improvements to apex.mlp (#804) · 31aceeaa
      Deyu Fu authored
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      31aceeaa
  25. 22 Apr, 2020 2 commits
    • Deyu Fu's avatar
    • Vinicius Reis's avatar
      Fix LARC with mixed precision (#793) · 2ec84ebd
      Vinicius Reis authored
      The LARC optimizer wraps an underlying optimizer and then needs to be passed
      to amp.initialize for mixed precision. There were 3 different crashes happening
      in this situation, fix all of them and add a unit test.
      
      I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
      entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
      defined seems more reliable though.
      2ec84ebd
  26. 31 Mar, 2020 1 commit
  27. 27 Feb, 2020 1 commit
  28. 03 Oct, 2019 1 commit
  29. 03 Sep, 2019 1 commit
    • Deyu Fu's avatar
      Fix issues in fused_dam (#469) · 7fa74925
      Deyu Fu authored
      * move import of amp_C to __init__()
      
      * make fp16/32 separate lists to support mixed param types, disable double test
      
      * make zero_grad consistent between adam/novograd/lamb
      7fa74925
  30. 27 Aug, 2019 1 commit
    • ptrblck's avatar
      Enable Checkpointing (#420) · dec4fdd6
      ptrblck authored
      * add state_dict, load_state_dict
      
      * add test_restoring, test_loss_scale_decrease
      
      * disable amp outputs for checkpoint tests
      
      * add test for amp.state_dict, cleanup
      
      * add state_dict patch, add test
      
      * fixed testing, cleanup
      
      * add readme for checkpointing
      
      * add docs to source/amp
      
      * add review changes to doc
      dec4fdd6
  31. 17 Aug, 2019 1 commit
  32. 15 Aug, 2019 1 commit
  33. 13 Aug, 2019 2 commits