1. 23 Feb, 2021 1 commit
    • Myle Ott's avatar
      Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e
      Myle Ott authored
      Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336
      
      ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.
      
      Compared to PyTorch DDP:
      * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
      * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
      * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
          * all-gather parameters at start of forward pass and start of backward pass
          * reduce-scatter grads at end of backward pass
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      15512d9e
  2. 19 Feb, 2021 1 commit
  3. 18 Feb, 2021 2 commits
  4. 17 Feb, 2021 1 commit
  5. 12 Feb, 2021 1 commit
  6. 10 Feb, 2021 1 commit
  7. 09 Feb, 2021 1 commit
  8. 04 Feb, 2021 4 commits
  9. 03 Feb, 2021 2 commits
  10. 02 Feb, 2021 1 commit
  11. 30 Jan, 2021 1 commit
  12. 29 Jan, 2021 1 commit
  13. 27 Jan, 2021 1 commit
  14. 23 Jan, 2021 1 commit
  15. 21 Jan, 2021 3 commits
  16. 15 Jan, 2021 1 commit
  17. 11 Jan, 2021 1 commit
  18. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  19. 02 Jan, 2021 1 commit
  20. 30 Dec, 2020 1 commit
  21. 29 Dec, 2020 1 commit
  22. 28 Dec, 2020 1 commit
  23. 19 Dec, 2020 1 commit
  24. 10 Dec, 2020 1 commit
  25. 04 Dec, 2020 1 commit
  26. 01 Dec, 2020 2 commits
  27. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  28. 18 Nov, 2020 1 commit
  29. 11 Nov, 2020 2 commits
  30. 10 Nov, 2020 1 commit
    • Tom Birch's avatar
      Single-process control via PipeRPCWrapper (#156) · 5d4f50fb
      Tom Birch authored
      Adds support for:
      * Reused layers (e.g. for weight sharing)
      * Lazily-constructed layers
      * Single-process control via PipeRPCWrapper
      * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive
      
      Also added examples for multi-process and PipeRPCWrapper
      5d4f50fb
  31. 30 Oct, 2020 1 commit