1. 17 Nov, 2022 1 commit
    • Anthony Chen's avatar
      Integrate PyTorch Fully Sharded Data Parallel (FSDP) · 02625ff8
      Anthony Chen authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/d2go/pull/396
      
      Integrate PyTorch FSDP, which supports two sharding modes: 1. gradient + optimizer sharding; 2. full model sharding (params + gradient + optimizer). This feature is enabled in the train_net.py code path.
      
      Sources
      * Integration follows this tutorial: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html
      
      API changes
      * Add new config keys to support the new feature. Refer to mobile-vision/d2go/d2go/trainer/fsdp.py for the full list of config options
      * Add `FSDPCheckpointer` as an inheritance of `QATCheckpointer` to support special loading/saving logic for FSDP models
      
      Reviewed By: wat3rBro
      
      Differential Revision: D39228316
      
      fbshipit-source-id: 342ecb3bcbce748453c3fba2d6e1b7b7e478473c
      02625ff8