1. 14 Feb, 2022 1 commit
    • Min Xu's avatar
      [chore] [cleanup]: pytest, pytorch new versions, fix tests (#933) · fae29959
      Min Xu authored
      
      
      * update pytest versions
      
      * [test] test related changes
      
      - upgrade to newer pytorch versions
      - added function to make test more deterministic on A100 and TF32
      - fixed some tests so that they are correctly skipped on a single GPU system
      
      * more fixes
      
      * formatting overly long lines
      
      * format
      
      * better test without trigger a warning
      
      * fix an optim state bug with newer pytorch
      
      - adam optimizer seems to return "step" as a singleton tensor now in the
      nightly build
      - this fixes it assumeing non-tensor value can still be loaded back by
      the optimizer
      
      * improve oss.py
      
      - use min_loss for regression checking is a bit more reliable
      - also increased the num epochs from 10 to 12
      
      * small oss.py fix
      
      * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py
      Co-authored-by: default avatarMin Xu <min.xu.public@gmail.com>
      fae29959
  2. 12 Nov, 2021 1 commit
    • Anupam Bhatnagar's avatar
      Setup pre-commit github action and apply pre-commit to all files (#849) · 7d7edf6d
      Anupam Bhatnagar authored
      * adding pre-commit files
      
      * applying pre-commit to all files
      
      * adding no-strict-optional argument to mypy in circle ci config
      
      * fix typo
      
      * updating python versions
      
      * [skip ci] remove extra args
      
      * adding python 3.9
      
      * [skip ci] set pre-commit version in requirements-dev.txt
      
      * set CACHE_VERSION
      
      * move linters from circleci to github actions
      
      * update python version
      
      * update python version in benchmarks_2
      
      * moving to python 3.9.7
      7d7edf6d
  3. 05 May, 2021 1 commit
  4. 03 May, 2021 1 commit
  5. 29 Apr, 2021 2 commits
  6. 22 Apr, 2021 1 commit
  7. 07 Apr, 2021 1 commit
  8. 06 Apr, 2021 1 commit
  9. 30 Mar, 2021 1 commit
  10. 17 Mar, 2021 1 commit
  11. 11 Mar, 2021 1 commit
  12. 05 Mar, 2021 1 commit
  13. 25 Feb, 2021 1 commit
  14. 23 Feb, 2021 1 commit
  15. 19 Feb, 2021 1 commit
  16. 18 Feb, 2021 2 commits
  17. 17 Feb, 2021 1 commit
  18. 12 Feb, 2021 1 commit
  19. 04 Feb, 2021 1 commit
  20. 03 Feb, 2021 1 commit
  21. 02 Feb, 2021 1 commit
  22. 15 Jan, 2021 1 commit
  23. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  24. 02 Jan, 2021 1 commit
  25. 30 Dec, 2020 1 commit
  26. 19 Dec, 2020 1 commit
  27. 10 Dec, 2020 1 commit
  28. 04 Dec, 2020 1 commit
  29. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  30. 06 Oct, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS/SDP : bucketing (#122) · 341d8b2b
      Benjamin Lefaudeux authored
      Same bucketing strategy for OSS and SDP:
      sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
      341d8b2b
  31. 29 Sep, 2020 1 commit
  32. 17 Sep, 2020 1 commit
  33. 28 Aug, 2020 1 commit
  34. 06 Aug, 2020 1 commit