1. 06 Apr, 2022 1 commit
  2. 30 Mar, 2022 1 commit
  3. 16 Mar, 2022 1 commit
  4. 09 Mar, 2022 2 commits
  5. 08 Mar, 2022 1 commit
  6. 05 Mar, 2022 1 commit
    • Dmitry Vinnik's avatar
      docs: add GH button in support of Ukraine (#949) · 2877474c
      Dmitry Vinnik authored
      * Adding ELI5 video to Fairscale
      
      * docs: add GH button in support of Ukraine
      
      ## Summary:
      Our mission at Meta Open Source is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.
      2877474c
  7. 04 Mar, 2022 1 commit
  8. 03 Mar, 2022 1 commit
  9. 02 Mar, 2022 2 commits
  10. 23 Feb, 2022 2 commits
  11. 22 Feb, 2022 1 commit
    • anj-s's avatar
      [benchmarks] Add benchmarks for FSDP (#765) · f9a125db
      anj-s authored
      * add benchmarks for fsdp
      
      * fix lint errors
      
      * clean up
      
      * clean up unused flags
      
      * add the benchmarks
      
      * remove unused args
      
      * fix lint errors
      
      * fix lint errors
      
      * update command line
      
      * add support for multiple devices
      
      * try full fp16 mode
      
      * try full fp16 mode
      
      * lint errors
      
      * merge main
      
      * lint errors
      
      * lint errors
      
      * lint error
      
      * update intersphinx mapping for numpy
      
      * update intersphinx mapping for numpy
      
      * skip test
      
      * added golden configs
      
      * use synthetic benchmarks
      
      * fix fn name
      
      * fix cuda device id
      
      * fix verify
      
      * lint fix
      f9a125db
  12. 15 Feb, 2022 2 commits
  13. 14 Feb, 2022 1 commit
    • Min Xu's avatar
      [chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959
      Min Xu authored
      
      
      * update pytest versions
      
      * [test] test related changes
      
      - upgrade to newer pytorch versions
      - added function to make test more deterministic on A100 and TF32
      - fixed some tests so that they are correctly skipped on a single GPU system
      
      * more fixes
      
      * formatting overly long lines
      
      * format
      
      * better test without trigger a warning
      
      * fix an optim state bug with newer pytorch
      
      - adam optimizer seems to return "step" as a singleton tensor now in the
      nightly build
      - this fixes it assumeing non-tensor value can still be loaded back by
      the optimizer
      
      * improve oss.py
      
      - use min_loss for regression checking is a bit more reliable
      - also increased the num epochs from 10 to 12
      
      * small oss.py fix
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      fae29959
  14. 11 Feb, 2022 1 commit
  15. 08 Feb, 2022 2 commits
  16. 28 Jan, 2022 1 commit
  17. 25 Jan, 2022 2 commits
  18. 20 Jan, 2022 1 commit
  19. 18 Jan, 2022 1 commit
  20. 14 Jan, 2022 3 commits
  21. 13 Jan, 2022 3 commits
    • Anupam Bhatnagar's avatar
      [skip ci] fixing typos · 39e7821a
      Anupam Bhatnagar authored
      39e7821a
    • Anupam Bhatnagar's avatar
      [feature] [experimental] Layerwise Gradient Scaler (#879) · 52d066a2
      Anupam Bhatnagar authored
      * [skip ci] first commit
      
      * [skip ci] gradient scaler example
      
      * [skip ci] adding feed forward toy example
      
      * [skip ci] adding types
      
      * [skip ci] adding backward hook
      
      * [skip ci] update
      
      * [skip ci] working feed forward example
      
      * [skip ci] working feed forward example
      
      * [skip ci] use named_modules instead of named_children
      
      * [skip ci] adding new file
      
      * [skip ci] clean up
      
      * [skip ci] implement unscale function
      
      * [skip ci] implement unscale function
      
      * [skip ci] removing old file
      
      * [skip ci] removing some more old files
      
      * [skip ci] making unscale function generic
      
      * [skip ci] adding test for vision model
      
      * [skip ci] adding identity layer
      
      * [skip ci] cleanup files
      
      * [skip ci] refactoring
      
      * [skip ci] more refactoring
      
      * [skip ci] added functionality to update scale
      
      * [skip ci] data loader clean up
      
      * [skip ci] implemented inf checks and update scale functions
      
      * [skip ci]code clean up. added...
      52d066a2
    • tmarkstrum's avatar
      [Fix][FSDP]fixed padding size of input tensor for reduce scatter (#907) · fb4eca19
      tmarkstrum authored
      
      
      * fixed padding size of input tensor for reduce scatter, and fixed an error that assigned wrong group
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      
      * added changelog
      
      * fixed some commit.
      
      * added unit test to ensure the reduce_scatter process group size is correct in default cases. And fall back to default process grouop when the reduce_scatter process group has the wrong size.
      
      * throw an error instead of rolling back to use default process group for reduce_scatter_process_group
      
      * Revert "throw an error instead of rolling back to use default process group for reduce_scatter_process_group"
      
      This reverts commit eab5620da3b726ea55d3088ae4ca10d94dcdf4d9.
      
      * added check for None to avoid unit test failure
      
      * fixed an error to avoid the unit tests failure
      Co-authored-by: default avatarMin Xu <24926999+min-xu-ai@users.noreply.github.com>
      fb4eca19
  22. 12 Jan, 2022 1 commit
  23. 07 Jan, 2022 1 commit
    • tmarkstrum's avatar
      [FSDP] Enable FSDP reduce scatter overlap (#897) · 0a526bcb
      tmarkstrum authored
      * enable reduce scatter overlap with other operations
      
      * fixed unit tests and added docstrings for the new parameters for fsdp
      
      * fixed more unit tests
      
      * fixed unit tests
      
      * avoided the pickle error on process_group_reduce_scatter
      
      * removed an unnecessary parameter in unit tests
      
      * remove unnecessary prints
      
      * fixed the docstring
      
      * skipped the test_offload unit test because this unit test failed in the main branch
      
      * removed the enable_reduce_scatter_overlap API parameter
      
      * added doc string for the defualt value of process_group_reduce_scatter parameter
      
      * fixed a syntax bug
      
      * fixed a bug which cause unitest failure
      
      * removed the all_gather in the ProcessGroupName enum
      
      * added more comment
      
      * changed the default value of process_group_reduce_scatter from None to ProcessGroupName.reduce_scatter
      0a526bcb
  24. 06 Jan, 2022 2 commits
  25. 05 Jan, 2022 1 commit
    • Paul Johnson's avatar
      Enabling ssd_offload training basic tests. (#887) · c5e471bc
      Paul Johnson authored
      * Enabling ssd_offload training and test via tests/nn/data_parallel/test_fsdp_offload.py.
      * Removed unused classes: SsdBuffer, SsdTensorHandleView, SsdParameter, SsdTensor
      * Enhance test coverage of test_ssd_offloading_train_flatten_params_wrapper
      * Modifications from PR #887 review comments.
      * Update Changelog
      c5e471bc
  26. 24 Dec, 2021 1 commit
  27. 21 Dec, 2021 3 commits