- 25 Jul, 2022 1 commit
-
-
Aidyn-A authored
-
- 25 Mar, 2022 1 commit
-
-
Masaki Kozuki authored
* try PyTorch custom TestCase class * revert * initial working example * update * data utils * fix imports * hardcode backend to nccl * fix signature * fix typo * mapping * set device * init * refactor x entropy * remove unused import & destroy model parallel * refactor random * fix test * remove migrated tests * refactor * init * separate affine weight init * init model parallel * split more * weight init fix part 1 * use cpu init for consistency btwn native and tensor parallel * black * add col parallel * use a 3D tensor of square matrix for column parallel linear * skip the failing cases * migrate layers test * pipeline parallel forward/backward * fix typo * fix typo * fix * fix pipeline world size * black * rm `run_pipeline_parallel_test` in favor of test_pipeline_parallel_fwd_bwd.py * stop logging * set log level * black * license and format * fix * skip tf32 as matrices are small * remove potentially inappropriate license * Apply suggestions from code review * remove `TODO` comment * `torch.testing.assert_allclose` -> `torch.testing.assert_close` * remove comment-outs * remote unused import * minor fix
-
- 27 Oct, 2021 1 commit
-
-
Masaki Kozuki authored
* Init apex.ppu (pipeline model parallel utility) Reference commit: ``` commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main) Merge: 14f2c684 7b293d9b Author: Mohammad Shoeybi <mshoeybi@nvidia.com> Date: Wed Sep 22 22:57:54 2021 -0700 Merge branch 'add_BOS' into 'main' Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives See merge request ADLR/megatron-lm!328 ``` * removing get_args and replace import - phase 1 * removing get_args and replace import - phase 2 * move ppu to apex.transformer.pipeline_parallel * update two __init__.py * update READMEs * mpu -> parallel_state & tensor_parallel * fix * remove not pipeline files * separate schedules.py - phase 1 * dissect schedules.py * data_iterators -> batch * remove optimizer from forward_backward_step funcs * init test * Apply 2 suggestion(s...
-