"doc/vscode:/vscode.git/clone" did not exist on "25cee5810e8da6c2ce4611b413b0fb14c853b4a8"
  • Min Xu's avatar
    [fix] better handling non-flatten in FSDP (#1072) · 429f3d31
    Min Xu authored
    
    
    * [fix] better handling non-flatten in FSDP
    
    - see the detailed comment about that backward firing case
    - also minor debugging help in FSDP
    - also minor fix in FPW's state dict
    
    * [feat] disallow reset_parameters by default
    
    * [feat] adding fsdp_instances API - useful in check wrapping by user code
    
    * [fix] one line fix but more than a day of debugging
    
    * fixed the case of loading combined check with empty fsdp instances
    
    * fixed another bug around state loading the root/nonroot module full param caching due to not resharding after forward
    
    * [feat] support .half and .float better
    
    * fixed a bug in gather optim state losses extra keys from the original state_dict
    
    * fixed a test failure in mixed precision
    
    * fixed another bug affecting no_sync grad acc
    
    * fixed a bug and a test in fsdp optim state
    
    * fixed another corner case
    
    * added a comment
    
    * skip ssd offload tests
    
    * skip fsdp one for ssd overload
    Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
    429f3d31
test_fsdp_optimizer_utils.py 12.7 KB