• Anthony Chen's avatar
    Integrate PyTorch Fully Sharded Data Parallel (FSDP) · 02625ff8
    Anthony Chen authored
    Summary:
    Pull Request resolved: https://github.com/facebookresearch/d2go/pull/396
    
    Integrate PyTorch FSDP, which supports two sharding modes: 1. gradient + optimizer sharding; 2. full model sharding (params + gradient + optimizer). This feature is enabled in the train_net.py code path.
    
    Sources
    * Integration follows this tutorial: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html
    
    API changes
    * Add new config keys to support the new feature. Refer to mobile-vision/d2go/d2go/trainer/fsdp.py for the full list of config options
    * Add `FSDPCheckpointer` as an inheritance of `QATCheckpointer` to support special loading/saving logic for FSDP models
    
    Reviewed By: wat3rBro
    
    Differential Revision: D39228316
    
    fbshipit-source-id: 342ecb3bcbce748453c3fba2d6e1b7b7e478473c
    02625ff8
train_net.py 4.35 KB