1. 02 May, 2022 1 commit
    • Paul Johnson's avatar
      [FSDP] ssd_offload fixing backward path (grad_fn) for SsdFlatParameter and... · 51b53ddb
      Paul Johnson authored
      [FSDP] ssd_offload fixing backward path (grad_fn) for SsdFlatParameter and SsdFlatParameterView (#974)
      
      * [FSDP] fixing backward path for SsdFlatParameter and SsdFlatParameterView when overriding .data
      
      * Get ssd_offload unit tests passing
      
      * [FSDP] get all test_fsdp_offload tests passing w/ ssd_offload on
      
      * Update changelog
      51b53ddb
  2. 27 Apr, 2022 1 commit
  3. 26 Apr, 2022 1 commit
  4. 25 Apr, 2022 2 commits
  5. 06 Apr, 2022 1 commit
  6. 30 Mar, 2022 1 commit
  7. 16 Mar, 2022 1 commit
  8. 09 Mar, 2022 2 commits
  9. 08 Mar, 2022 1 commit
  10. 05 Mar, 2022 1 commit
    • Dmitry Vinnik's avatar
      docs: add GH button in support of Ukraine (#949) · 2877474c
      Dmitry Vinnik authored
      * Adding ELI5 video to Fairscale
      
      * docs: add GH button in support of Ukraine
      
      ## Summary:
      Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.
      2877474c
  11. 04 Mar, 2022 1 commit
  12. 03 Mar, 2022 1 commit
  13. 02 Mar, 2022 2 commits
  14. 23 Feb, 2022 2 commits
  15. 22 Feb, 2022 1 commit
    • anj-s's avatar
      [benchmarks] Add benchmarks for FSDP (#765) · f9a125db
      anj-s authored
      * add benchmarks for fsdp
      
      * fix lint errors
      
      * clean up
      
      * clean up unused flags
      
      * add the benchmarks
      
      * remove unused args
      
      * fix lint errors
      
      * fix lint errors
      
      * update command line
      
      * add support for multiple devices
      
      * try full fp16 mode
      
      * try full fp16 mode
      
      * lint errors
      
      * merge main
      
      * lint errors
      
      * lint errors
      
      * lint error
      
      * update intersphinx mapping for numpy
      
      * update intersphinx mapping for numpy
      
      * skip test
      
      * added golden configs
      
      * use synthetic benchmarks
      
      * fix fn name
      
      * fix cuda device id
      
      * fix verify
      
      * lint fix
      f9a125db
  16. 15 Feb, 2022 2 commits
  17. 14 Feb, 2022 1 commit
    • Min Xu's avatar
      [chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959
      Min Xu authored
      
      
      * update pytest versions
      
      * [test] test related changes
      
      - upgrade to newer pytorch versions
      - added function to make test more deterministic on A100 and TF32
      - fixed some tests so that they are correctly skipped on a single GPU system
      
      * more fixes
      
      * formatting overly long lines
      
      * format
      
      * better test without trigger a warning
      
      * fix an optim state bug with newer pytorch
      
      - adam optimizer seems to return "step" as a singleton tensor now in the
      nightly build
      - this fixes it assumeing non-tensor value can still be loaded back by
      the optimizer
      
      * improve oss.py
      
      - use min_loss for regression checking is a bit more reliable
      - also increased the num epochs from 10 to 12
      
      * small oss.py fix
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      fae29959
  18. 11 Feb, 2022 1 commit
  19. 08 Feb, 2022 2 commits
  20. 28 Jan, 2022 1 commit
  21. 25 Jan, 2022 2 commits
  22. 20 Jan, 2022 1 commit
  23. 18 Jan, 2022 1 commit
  24. 14 Jan, 2022 3 commits
  25. 13 Jan, 2022 3 commits
    • Anupam Bhatnagar's avatar
      [skip ci] fixing typos · 39e7821a
      Anupam Bhatnagar authored
      39e7821a
    • Anupam Bhatnagar's avatar
      [feature] [experimental] Layerwise Gradient Scaler (#879) · 52d066a2
      Anupam Bhatnagar authored
      * [skip ci] first commit
      
      * [skip ci] gradient scaler example
      
      * [skip ci] adding feed forward toy example
      
      * [skip ci] adding types
      
      * [skip ci] adding backward hook
      
      * [skip ci] update
      
      * [skip ci] working feed forward example
      
      * [skip ci] working feed forward example
      
      * [skip ci] use named_modules instead of named_children
      
      * [skip ci] adding new file
      
      * [skip ci] clean up
      
      * [skip ci] implement unscale function
      
      * [skip ci] implement unscale function
      
      * [skip ci] removing old file
      
      * [skip ci] removing some more old files
      
      * [skip ci] making unscale function generic
      
      * [skip ci] adding test for vision model
      
      * [skip ci] adding identity layer
      
      * [skip ci] cleanup files
      
      * [skip ci] refactoring
      
      * [skip ci] more refactoring
      
      * [skip ci] added functionality to update scale
      
      * [skip ci] data loader clean up
      
      * [skip ci] implemented inf checks and update scale functions
      
      * [skip ci]code clean up. added...
      52d066a2
    • tmarkstrum's avatar
      [Fix][FSDP]fixed padding size of input tensor for reduce scatter (#907) · fb4eca19
      tmarkstrum authored
      
      
      * fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      
      * added changelog
      
      * fixed some commit.
      
      * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size.
      
      * throw an error instead of rolling back to use default process group for reduce_scatter_process_group
      
      * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group"
      
      This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9.
      
      * added check for None to avoid unit test failure
      
      * fixed an error to avoid the unit tests failure
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      fb4eca19
  26. 12 Jan, 2022 1 commit
  27. 07 Jan, 2022 1 commit
    • tmarkstrum's avatar
      [FSDP] Enable FSDP reduce scatter overlap (#897) · 0a526bcb
      tmarkstrum authored
      * enable reduce scatter overlap with other operations
      
      * fixed unit tests and added docstrings for the new parameters for fsdp
      
      * fixed more unit tests
      
      * fixed unit tests
      
      * avoided the pickle error on process_group_reduce_scatter
      
      * removed an unnecessary parameter in unit tests
      
      * remove unnecessary prints
      
      * fixed the docstring
      
      * skipped the test_offload unit test because this unit test failed in the main branch
      
      * removed the enable_reduce_scatter_overlap API parameter
      
      * added doc string for the defualt value of process_group_reduce_scatter parameter
      
      * fixed a syntax bug
      
      * fixed a bug which cause unitest failure
      
      * removed the all_gather in the ProcessGroupName enum
      
      * added more comment
      
      * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
      0a526bcb
  28. 06 Jan, 2022 2 commits