1. 05 May, 2021 1 commit
    • Min Xu's avatar
      [fix] add clear_autocast_cache flag (#650) · 861b5ce2
      Min Xu authored
      
      
      * [fix] add clear_autocast_cache flag
      
      - when training in AMP model with weight dtype32, FSDP may need to
        optionally clear the autocast cache to avoid GPU OOM
      - this flag is default false, automatically doing it is a future TODO
      - also added a verbose flag to make print(fsdp_model) a bit shorter
      - updated the memory test to cover those new code
      - added a couple of useful functions in parallel.py and testing.py
      
      * minor
      
      * address comments
      
      * format
      
      * improve the test
      Co-authored-by: default avatarMin Xu <min.xu@acm.org>
      861b5ce2
  2. 26 Feb, 2021 1 commit
  3. 01 Dec, 2020 1 commit
  4. 10 Nov, 2020 1 commit
    • Tom Birch's avatar
      Single-process control via PipeRPCWrapper (#156) · 5d4f50fb
      Tom Birch authored
      Adds support for:
      * Reused layers (e.g. for weight sharing)
      * Lazily-constructed layers
      * Single-process control via PipeRPCWrapper
      * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive
      
      Also added examples for multi-process and PipeRPCWrapper
      5d4f50fb
  5. 17 Sep, 2020 1 commit
    • Tom Birch's avatar
      Multi-process pipe (#90) · 63f7796a
      Tom Birch authored
      Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
      * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
      * Added support for lazy construction of modules (see lazy_construction for an example)
      * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
      * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
      63f7796a
  6. 16 Sep, 2020 1 commit
  7. 31 Jul, 2020 1 commit
  8. 08 Jul, 2020 1 commit