"src/vscode:/vscode.git/clone" did not exist on "e02ec27e51751a28a4fa1e0d95537f824d523bbe"
Pipeline Model Parallel (#1202)
* Init apex.ppu (pipeline model parallel utility)
Reference commit:
```
commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main)
Merge: 14f2c684 7b293d9b
Author: Mohammad Shoeybi <mshoeybi@nvidia.com>
Date: Wed Sep 22 22:57:54 2021 -0700
Merge branch 'add_BOS' into 'main'
Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives
See merge request ADLR/megatron-lm!328
```
* removing get_args and replace import - phase 1
* removing get_args and replace import - phase 2
* move ppu to apex.transformer.pipeline_parallel
* update two __init__.py
* update READMEs
* mpu -> parallel_state & tensor_parallel
* fix
* remove not pipeline files
* separate schedules.py - phase 1
* dissect schedules.py
* data_iterators -> batch
* remove optimizer from forward_backward_step funcs
* init test
* Apply 2 suggestion(s) to 2 file(s)
* fix cyclic import
* fix syntax of Callable
* fix - 1
* move directory as testing used for pp test as well
* add some functions for num microbatches calculator
* model is a list in pipeline parallel
* skip build num microbatch calculator
* fix test
* assert -> raise
* skip args printing
* specify tensor shape everywhere even if None - phase 1
* private timers
* passing tensor shape & dtype around
* update dtype handling by introducing helper func
* write helper func to reduce cyclomatic complexity
* remove duplicate
* update
* move split_tensor_into_1d_equal_chunks to avoid cyclic import
* tmp
* cosmetic
* move gather_split_1d_tensor to avoid cyclic imports
* remove debug print
* add outer loop
* early return if possible
* cosmetic
* passing around tensor shape
* refactor test
* add script to learn batch sampler behavior
* update
* minibatch splitter
* add minibatch splitter
* split minibatch into microbatches
* minor changes
* uncomment split batch for test sake
* set as attribute
* study the behavior of no pipelining
* debug 1
* reflect test util namespace change
* update readme
* cosmetic in test
* add model build helper func for interleaving shced
* adding model builder from megatron
* canbe cyclic import
* fix
* enable interleaving test, but failing even if forward only
* fix batch preparation
* add explanation
* print data parallel size
* fix typo
* Add Megatron style GPT model by Rishi
Co-authored-by:
Rishi Puri <riship@nvidia.com>
* update
* type hint for jit
* fix forward_backward_no_pipelining test
* pipeline forward backward seem to hang if not forward only
* fix typo
* debug
* add p2p test
* simplify
* fix
* tentative
* set both tmp and pmp to 1
* init
* fix typo
* fix
* fix path of divide
* set seed for tmp
* update upon Eddie comment
* fix typo
* adding failing data loader test
* fix
* megatron still failing
* check in
* with the nested loop of new order, interleaving seems fine
* cosmetic change
* make `forward_backward_pipelining_with_interleaving private
* warn users that interleaving sched is unstable
* move noop handler to no pipelining
* comment out rank_print
* make `build_model` more flexible
* skip megatron test tentatively
* correctly comment out rank_print
* correctly comment out rank_print
* correctly comment out rank_print
* skip appropriately
* remove wip p2p comm test
* update type hint of model_provider_func
* disable tf32 in each test script
* skip interleaving w/ backward
* rename as mpu is the old name
* remove broken case
* expose build_model func
* delete `dist.ring_exchange` func call and `use_ring_exchange` argument
* nit fixes
* check in
* remove unused file
* update the list
* update tensor shape
* remove mixed dtype case
* use torch.distributed.run
* 2020 -> 2021
* another 2020 -> 2021
* docstring & type hint
* fix teardown
* update
* change to experimental
* check if warned
Co-authored-by:
Rishi Puri <riship@nvidia.com>
Co-authored-by:
Eddie Yan <eddiey@nvidia.com>
Showing
Please register or sign in to comment