"src/vscode:/vscode.git/clone" did not exist on "b785ddb654e4be3ae0066e231734754bdb2a191c"
[POC] Support Megatron-LM's `rampup_batch_size` argument (#1212)
* init logging use * fix * clean up * fp32 p2p comm * init * Dynamic global batch size with `MegatronPretrainingSampler` I couldn't make this script work with `MegatronPretrainingRandomSampler` because the random sampler seems to have some requirement for global batch size, total number of samples, local minibatch size, etc. which I'm not familiar with for now * revive original pipeline parallel test * update MULTIGPU_TEST: add dynamic batchsize test * run MegatronPretrainingRandomSampler * fix comment * fix * update * cosmetic * add note * Apply 2 suggestion(s) to 2 file(s) * change following https://github.com/NVIDIA/apex/pull/1210 * fix
Showing
apex/transformer/log_util.py
0 → 100644
Please register or sign in to comment