• Jiarui Fang's avatar
    Feature/zero (#279) · 5a560a06
    Jiarui Fang authored
    
    
    * add zero1 (#209)
    
    * add zero1
    
    * add test zero1
    
    * update zero stage 1 develop (#212)
    
    * Implement naive zero3 (#240)
    
    * naive zero3 works well
    
    * add zero3 param manager
    
    * add TODOs in comments
    
    * add gather full param ctx
    
    * fix sub module streams
    
    * add offload
    
    * fix bugs of hook and add unit tests
    
    * fix bugs of hook and add unit tests (#252)
    
    * add gather full param ctx
    
    * fix sub module streams
    
    * add offload
    
    * fix bugs of hook and add unit tests
    
    * polish code and add state dict hook
    
    * fix bug
    
    * update unit test
    
    * refactor reconstructed zero code
    
    * clip_grad support zero3 and add unit test
    
    * add unit test for Zero3ParameterManager
    
    * [WIP] initialize the shard param class
    
    * [WIP] Yet another sharded model implementation (#274)
    
    * [WIP] initialize the shard param class
    
    * [WIP] Yes another implementation of shardModel. Using a better hook method.
    
    * torch.concat -> torch.cat
    
    * fix test_zero_level_1.py::test_zero_level_1 unitest
    
    * remove deepspeed implementation and refactor for the reconstructed zero module
    
    * polish zero dp unittests
    Co-authored-by: default avatarver217 <lhx0217@gmail.com>
    Co-authored-by: default avatarFrank Lee <somerlee.9@gmail.com>
    5a560a06
common.py 2.21 KB