• Masaki Kozuki's avatar
    [POC] Support Megatron-LM's `rampup_batch_size` argument (#1212) · 35336133
    Masaki Kozuki authored
    * init logging use
    
    * fix
    
    * clean up
    
    * fp32 p2p comm
    
    * init
    
    * Dynamic global batch size with `MegatronPretrainingSampler`
    
    I couldn't make this script work with `MegatronPretrainingRandomSampler` because the random sampler seems to have some requirement for
    global batch size, total number of samples, local minibatch size, etc. which I'm not familiar with for now
    
    * revive original pipeline parallel test
    
    * update MULTIGPU_TEST: add dynamic batchsize test
    
    * run MegatronPretrainingRandomSampler
    
    * fix comment
    
    * fix
    
    * update
    
    * cosmetic
    
    * add note
    
    * Apply 2 suggestion(s) to 2 file(s)
    
    * change following https://github.com/NVIDIA/apex/pull/1210
    
    * fix
    35336133
__init__.py 248 Bytes