• Baizhou Zhang's avatar
    [gemini] support gradient accumulation (#4869) · 21ba89ca
    Baizhou Zhang authored
    * add test
    
    * fix no_sync bug in low level zero plugin
    
    * fix test
    
    * add argument for grad accum
    
    * add grad accum in backward hook for gemini
    
    * finish implementation, rewrite tests
    
    * fix test
    
    * skip stuck model in low level zero test
    
    * update doc
    
    * optimize communication & fix gradient checkpoint
    
    * modify doc
    
    * cleaning codes
    
    * update cpu adam fp16 case
    21ba89ca
gemini_optimizer.py 31.8 KB