• Min Xu's avatar
    [feat]: prepare FSDP to handle multiple flatten params and fixed metadata saving for MoE (#746) · 83b0b49e
    Min Xu authored
    
    
    * [feat] FSDP: supporting multiple flatten parameter groups
    
    - step 3: make FSDP use FlattenParamModule unconditionally
    
    * fixing the auto_wrap tests
    
    * minor
    
    * rewrite local_metadata_dict
    
    - updated FPW so that custom flat param name is also supported
    
    * bug fix
    
    * mypy
    
    * rewrote consolidate_shard_weights
    
    - test_consolidate passes
    
    * comments
    
    * fixing pickling
    
    * Fix shared params and MoE logic (#749)
    
    * add strict kwarg to support fairseq:gshard MoE saving logic
    
    * Test fairseq style shard
    
    * style
    
    * formatting and address comments
    
    * added changelog
    
    * fixing a test after padding renaming
    Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
    Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
    83b0b49e
To find the state of this project's repository at the time of any of these versions, check out the tags..
CHANGELOG.md 10.7 KB