1. 26 Feb, 2021 1 commit
  2. 04 Feb, 2021 1 commit
  3. 03 Feb, 2021 2 commits
  4. 29 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test with py39 + torch 1.8 nightly (#339) · e348806b
      Min Xu authored
      * [test]: test with py39 + torch 1.8 nightly
      
      * version fix
      
      * more fix
      
      * fix version function for nightly version
      
      * fix torch_pg build
      
      * invalidate cache
      
      * separate benchmark requirements
      
      * comment
      
      * fixed mypy
      
      * fixed a test
      e348806b
  5. 27 Jan, 2021 1 commit
  6. 25 Jan, 2021 1 commit
    • Min Xu's avatar
      [test] cover python 3.7 to 3.9 on CPU (#303) · 8459634f
      Min Xu authored
      * [test] cover python 3.7 to 3.9 on CPU
      
      - covering common python versions on CPU tests
      - added doc build test
      
      * add doc build test
      
      * skipping failing tests on py39
      
      * catching doc build warnings
      
      * add doc build to py38 and py39
      
      * minor fix
      
      * fix doc build for adascale
      
      * removed dead code
      
      * fix the skipping
      
      * skip unit test for py39
      
      * add failing example
      
      * no more py39 skipping the tests
      8459634f
  7. 16 Jan, 2021 1 commit
  8. 15 Jan, 2021 1 commit
  9. 11 Jan, 2021 1 commit
  10. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  11. 30 Dec, 2020 1 commit
  12. 22 Dec, 2020 1 commit
  13. 30 Nov, 2020 1 commit
  14. 22 Nov, 2020 1 commit
  15. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  16. 20 Nov, 2020 1 commit
  17. 19 Nov, 2020 1 commit
  18. 06 Nov, 2020 1 commit
  19. 30 Oct, 2020 1 commit
  20. 29 Oct, 2020 1 commit
  21. 28 Oct, 2020 1 commit
  22. 23 Oct, 2020 1 commit
  23. 22 Oct, 2020 1 commit
  24. 21 Oct, 2020 1 commit
  25. 17 Oct, 2020 1 commit
  26. 16 Oct, 2020 1 commit
  27. 14 Oct, 2020 1 commit
  28. 10 Oct, 2020 1 commit
  29. 09 Oct, 2020 1 commit
  30. 08 Oct, 2020 1 commit
  31. 01 Oct, 2020 1 commit
  32. 24 Sep, 2020 1 commit
  33. 22 Sep, 2020 1 commit
  34. 17 Sep, 2020 2 commits
    • Tom Birch's avatar
      Multi-process pipe (#90) · 63f7796a
      Tom Birch authored
      Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines)
      * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess
      * Added support for lazy construction of modules (see lazy_construction for an example)
      * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv
      * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
      63f7796a
    • Benjamin Lefaudeux's avatar
      [feat] Sharded DDP - small refactor and new features (#97) · 49a198c9
      Benjamin Lefaudeux authored
      - rename oss_ddp to ShardedDataParallel
      - some refactoring
      - ShardedDataParallel owns the sharded optimizer, exposed if need be
      - some small perf bumps
      49a198c9
  35. 03 Sep, 2020 1 commit
    • Jun Ru Anderson's avatar
      Add grad scaler (#48) · b6a5e634
      Jun Ru Anderson authored
      
      
      Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      b6a5e634
  36. 21 Aug, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] Simple macro OSS benchmark (#47) · 46c3776b
      Benjamin Lefaudeux authored
      
      
      * initial commit, dummy training loop, pure pytorch but not DDP
      
      * probably slightly broken, but rough DDP benchmark run
      
      * adding the torchvision requirement for testing
      
      * brainfart
      
      * reduce the loss, do something slightly distributed
      
      * Some cleanup, distributing the training on two GPUs
      
      * some cleanup + adding a vanilla run, still not good to go
      
      * less silly defaults, gtg for a start I think
      
      * smaller batch to fit the smaller gpus used in the circleci rigs
      
      * Adding some options for the benchmark, and regression testing
      
      * [test] set torch seed for Adam tests (#49)
      
      Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      
      * linting, I really need to automate this isort insanity
      Co-authored-by: default avatarJun Ru Anderson <33384298+andersonic@users.noreply.github.com>
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      46c3776b
  37. 14 Aug, 2020 1 commit
  38. 13 Aug, 2020 1 commit