• Baizhou Zhang's avatar
    [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) · c9625dbb
    Baizhou Zhang authored
    * implement sharded optimizer saving
    
    * add more param info
    
    * finish implementation of sharded optimizer saving
    
    * fix bugs in optimizer sharded saving
    
    * add pp+zero test
    
    * param group loading
    
    * greedy loading of optimizer
    
    * fix bug when loading
    
    * implement optimizer sharded saving
    
    * add optimizer test & arrange checkpointIO utils
    
    * fix gemini sharding state_dict
    
    * add verbose option
    
    * add loading of master params
    
    * fix typehint
    
    * fix master/working mapping in fp16 amp
    c9625dbb
hybrid_parallel_checkpoint_io.py 34.9 KB