1. 23 Feb, 2021 2 commits
    • Min Xu's avatar
      [docs] fsdp changelog and doc (#414) · 2b15720b
      Min Xu authored
      2b15720b
    • Myle Ott's avatar
      Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e
      Myle Ott authored
      Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336
      
      ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.
      
      Compared to PyTorch DDP:
      * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
      * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
      * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
          * all-gather parameters at start of forward pass and start of backward pass
          * reduce-scatter grads at end of backward pass
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      15512d9e
  2. 22 Feb, 2021 1 commit
  3. 19 Feb, 2021 4 commits
  4. 18 Feb, 2021 3 commits
  5. 17 Feb, 2021 1 commit
  6. 14 Feb, 2021 1 commit
  7. 12 Feb, 2021 3 commits
  8. 11 Feb, 2021 2 commits
  9. 10 Feb, 2021 2 commits
  10. 09 Feb, 2021 1 commit
  11. 08 Feb, 2021 2 commits
  12. 05 Feb, 2021 2 commits
  13. 04 Feb, 2021 6 commits
  14. 03 Feb, 2021 7 commits
  15. 02 Feb, 2021 2 commits
  16. 01 Feb, 2021 1 commit