1. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  2. 05 Nov, 2021 1 commit
    • Min Xu's avatar
      [feat] experimental MEVO layer (#840) · 8347c1a2
      Min Xu authored
      
      
      * [feat] MEVO kernel
      
      - initial import from min/softmax and min/testing branches
      - need to rename and further cleanup
      
      * only test with newer pytorch
      
      * renamed and added comments and code cleanup
      
      * rename and reduce test memory
      
      * testing
      
      * minor fixing
      
      * fixing
      
      * more fix
      
      * changelog
      
      * more 1.7 and 1.8 paper cuts
      
      * remove dead code
      
      * addressed Benjamin's comments
      
      * addressed more comments
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      8347c1a2
  3. 12 Sep, 2021 1 commit
    • Darryl Barnhart's avatar
      [fix] FSDP intra-backwards gradient accumulation. (#784) · 4fa2ab9b
      Darryl Barnhart authored
      * [fix] FSDP intra-backwards gradient accumulation.
      
      Ensure gradient reduction accumulates into the unsharded gradient tensor
      within a backwards pass. This matters when an FSDP module is called
      multiple times within a forward pass, and reduction is _not_ deferred
      using activation checkpoint forward counters, bucketing or some other
      mechanism.
      
      Closes #780
      
      * [refactor] Remove forward counters. Comments.
      
      Removed forward counters from the activation checkpointing utility, now
      that FSDP does not require them for correct operation. Add more detailed
      comment about memory usage behaviour with gradient reduction.
      
      * [refactor] Delete deprecated forward counter usage.
      
      * [refactor] Add state assertion as end of pre-backward hook.
      4fa2ab9b
  4. 17 May, 2021 1 commit
    • Min Xu's avatar
      [fix] auto_wrap: support wrapping based on wrapper_config (#685) · 9d2bbcf2
      Min Xu authored
      
      
      * [fix] auto_wrap: support wrapping based on wrapper_config
      
      - user can use this to avoid assert if auto_wrap is used multiple times on a module
      - user can traverse the modules multiple times and assign a wrapper_config
        to the module and then use auto_wrap once to wrap them
      
      fix #649
      fix #585
      
      * added changelog
      
      * fix tests
      
      * fix a test
      
      * added an optional assert for collision based on discussions with Quentin
      
      * added config_auto_wrap_policy
      
      * lint
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      9d2bbcf2
  5. 14 May, 2021 1 commit
  6. 26 Apr, 2021 1 commit
  7. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [feat]: checkpoint and normalization (#457) · 5e64d6a7
      Min Xu authored
      * [feat]: checkpoint and normalization
      
      - added special handling of BN for track_running_stats and checkpointing
      - we test BN/LN and checkpointing
      - we test them with mixed precision
      5e64d6a7
  8. 02 Mar, 2021 1 commit
    • Sean Naren's avatar
      [feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550
      Sean Naren authored
      This adds a context manager that assists in making child modules with similar defaults.
      Usage:
      ```
      from fairscale.nn.misc import enable_wrap, wrap
      
      with enable_wrap(**handleful_of_important_params):
          layer_1 = wrap(torch.nn.Linear(5, 5))
          layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like
      
      # without the context manager, creates Linear layer
      layer_1 = wrap(torch.nn.Linear(5, 5))
      ```
      If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.
      f3359550
  9. 23 Feb, 2021 1 commit
    • Myle Ott's avatar
      Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e
      Myle Ott authored
      Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336
      
      ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.
      
      Compared to PyTorch DDP:
      * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
      * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
      * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
          * all-gather parameters at start of forward pass and start of backward pass
          * reduce-scatter grads at end of backward pass
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      15512d9e
  10. 17 Sep, 2020 1 commit
    • Tom Birch's avatar
      Multi-process pipe (#90) · 63f7796a
      Tom Birch authored
      Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
      * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
      * Added support for lazy construction of modules (see lazy_construction for an example)
      * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
      * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
      63f7796a
  11. 31 Jul, 2020 2 commits
  12. 08 Jul, 2020 1 commit