"torchvision/csrc/cpu/roi_align_kernel.cpp" did not exist on "74679cc566f98398db13df0312cc11188733f1f3"
  1. 05 Mar, 2021 2 commits
  2. 04 Mar, 2021 2 commits
  3. 03 Mar, 2021 2 commits
  4. 02 Mar, 2021 2 commits
    • Myle Ott's avatar
      d2924670
    • Sean Naren's avatar
      [feat] Add context manager to FSDP for easier child module wrapping (#446) · f3359550
      Sean Naren authored
      This adds a context manager that assists in making child modules with similar defaults.
      Usage:
      ```
      from fairscale.nn.misc import enable_wrap, wrap
      
      with enable_wrap(**handleful_of_important_params):
          layer_1 = wrap(torch.nn.Linear(5, 5))
          layer_2 = wrap(torch.nn.Linear(5, 5), flatten_parameters=True) # Override parameters if you'd like
      
      # without the context manager, creates Linear layer
      layer_1 = wrap(torch.nn.Linear(5, 5))
      ```
      If not within the FSDP context, this would be a no-op. This makes it easier to annotate layers without having to copy any changes in parameters.
      f3359550
  5. 01 Mar, 2021 2 commits
    • Min Xu's avatar
      [chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7
      Min Xu authored
      * [chores]: CI py39 on GPU and more efficiency
      
      * add test list files
      
      * fix
      
      * add test list files
      
      * split benchmark run into 2 runs
      
      * fix 1.8 version and balance benchmarks
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * recording tests
      
      * py39 install fix
      
      * test again
      
      * move tests
      
      * reorg tests
      
      * skip tests for torch 1.8 due to an upstream bug
      
      * removed __init__.py from tests since it confuses pytest
      
      * Revert "removed __init__.py from tests since it confuses pytest"
      
      This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.
      
      * don't include __init__ in file list
      
      * notes on __init__.py and added missing ones
      
      * fixed mypy in a test file
      
      * balance test runtime
      
      * better pip install
      
      * balance more
      
      * pip fix
      
      * balance
      
      * balance more, all test should finish within 20m now
      
      * minor license update
      
      * trying cu102
      
      * more doc and addressed Ben's comments
      
      * debugging
      
      * debugging...
      5eb6b8c7
    • Min Xu's avatar
      [test] FSDP: add the failing test for #421 (#453) · 5ecac15a
      Min Xu authored
      
      
      * [test] FSDP: add the failing test for #421
      
      * skip on 1.5
      
      * better skipping
      
      * Update tests/nn/data_parallel/test_fsdp_grad_scaler.py
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      5ecac15a
  6. 27 Feb, 2021 1 commit
  7. 26 Feb, 2021 3 commits
  8. 25 Feb, 2021 2 commits
  9. 24 Feb, 2021 1 commit
  10. 23 Feb, 2021 4 commits
    • Min Xu's avatar
      [test]: add peak mem in checkpoint test (#415) · 4b5b4d3d
      Min Xu authored
      * [test]: add peak mem in checkpoint test
      
      * more debugging
      
      * new test
      
      * more fix
      
      * better collection of debug in case of future failures
      
      * update the comment
      
      * typo
      
      * comment
      
      * clarify
      
      * better wording
      4b5b4d3d
    • Benjamin Lefaudeux's avatar
      [perf][ShardedDDP] fp16 gradient reduce (#411) · d52d2186
      Benjamin Lefaudeux authored
      * POC, testing against the DDP comm hook when available
      * docs, adding a reference to DDP's compress hook
      * updating changelog, prep for v0.1.8 release
      d52d2186
    • Min Xu's avatar
      [bug]: not all CUDA memory is freed when model is deleted (#412) · e3035933
      Min Xu authored
      * [bug]: not all CUDA memory is freed when model is deleted
      
      * fixed memory leak
      
      - without this, peak memory will be high when more than one model
        is trained (i.e. first model leave staff around pushing up the
        peak memory when the second model runs)
      
      * addressed comments
      
      * fix
      
      * changelog
      e3035933
    • Myle Ott's avatar
      Add FullyShardedDataParallel (FSDP) (#413) · 15512d9e
      Myle Ott authored
      Recent work by [Microsoft](https://arxiv.org/abs/1910.02054) and [Google](https://arxiv.org/abs/2004.13336
      
      ) has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the new **`FullyShardedDataParallel` (FSDP)** wrapper, which is a drop-in replacement for PyTorch's `DistributedDataParallel` (DDP) wrapper.
      
      Compared to PyTorch DDP:
      * FSDP shards parameters (FP16 + FP32) and optimizer state across data parallel GPUs
      * FSDP with `reshard_after_forward=False` has the same communication cost as PyTorch DDP and is similar to ZeRO-2
      * FSDP with `reshard_after_forward=True` increases total communication by 50% and is similar to ZeRO-3:
          * all-gather parameters at start of forward pass and start of backward pass
          * reduce-scatter grads at end of backward pass
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      Co-authored-by: default avatarSam Shleifer <sshleifer@gmail.com>
      15512d9e
  11. 19 Feb, 2021 1 commit
  12. 18 Feb, 2021 2 commits
  13. 17 Feb, 2021 1 commit
  14. 12 Feb, 2021 1 commit
  15. 10 Feb, 2021 1 commit
  16. 09 Feb, 2021 1 commit
  17. 04 Feb, 2021 4 commits
  18. 03 Feb, 2021 2 commits
  19. 02 Feb, 2021 1 commit
  20. 30 Jan, 2021 1 commit
  21. 29 Jan, 2021 1 commit
  22. 27 Jan, 2021 1 commit
  23. 23 Jan, 2021 1 commit
  24. 21 Jan, 2021 1 commit