• Min Xu's avatar
    [feat] AdaScale: Gradient Accumulation and Add PyTest unit tests (#202) · ce5860ea
    Min Xu authored
    * added AdaScale to README
    
    * [adascale] added gradient accumulation
    
    - added gradient accumulation
    - tested with cifar full trainings with different value of accumulation
    and verified the full accuracy is obtained
    - also removed the patch optimize flag until we need it
    
    * [adascale] adding pytest
    
    - added basic and ddp tests and grad_accum
    - closes #195
    
    * added changelog
    
    * added ddp grad_accum test
    
    * moved ddp and non-ddp tests into separate files
    
    * added checkpoint test
    
    * more doc
    
    * addressed Mike's comments
    ce5860ea
README.md 5.82 KB