1. 22 Jun, 2021 1 commit
    • Pavel Belevich's avatar
      Update torch to 1.9.0 release (#717) · 1cc4c837
      Pavel Belevich authored
      * Update torch to 1.9.0.dev20210614+cu102
      
      * Update config.yml
      
      * Update config.yml
      
      * Update setup.py
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      1cc4c837
  2. 01 Jun, 2021 1 commit
  3. 15 Apr, 2021 1 commit
  4. 05 Apr, 2021 1 commit
  5. 02 Apr, 2021 1 commit
  6. 01 Apr, 2021 1 commit
  7. 31 Mar, 2021 1 commit
  8. 29 Mar, 2021 2 commits
  9. 12 Mar, 2021 1 commit
  10. 05 Mar, 2021 2 commits
  11. 04 Mar, 2021 2 commits
  12. 01 Mar, 2021 1 commit
    • Min Xu's avatar
      [chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7
      Min Xu authored
      * [chores]: CI py39 on GPU and more efficiency
      
      * add test list files
      
      * fix
      
      * add test list files
      
      * split benchmark run into 2 runs
      
      * fix 1.8 version and balance benchmarks
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * recording tests
      
      * py39 install fix
      
      * test again
      
      * move tests
      
      * reorg tests
      
      * skip tests for torch 1.8 due to an upstream bug
      
      * removed __init__.py from tests since it confuses pytest
      
      * Revert "removed __init__.py from tests since it confuses pytest"
      
      This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.
      
      * don't include __init__ in file list
      
      * notes on __init__.py and added missing ones
      
      * fixed mypy in a test file
      
      * balance test runtime
      
      * better pip install
      
      * balance more
      
      * pip fix
      
      * balance
      
      * balance more, all test should finish within 20m now
      
      * minor license update
      
      * trying cu102
      
      * more doc and addressed Ben's comments
      
      * debugging
      
      * debugging
      
      * better capture the errors
      
      * debugging
      
      * fix pyenv command
      
      * add universe repo
      
      * update to cuda 11 for 171
      
      * add a test file, improved the checking script
      5eb6b8c7
  13. 26 Feb, 2021 1 commit
  14. 04 Feb, 2021 1 commit
  15. 03 Feb, 2021 2 commits
  16. 29 Jan, 2021 1 commit
    • Min Xu's avatar
      [test]: test with py39 + torch 1.8 nightly (#339) · e348806b
      Min Xu authored
      * [test]: test with py39 + torch 1.8 nightly
      
      * version fix
      
      * more fix
      
      * fix version function for nightly version
      
      * fix torch_pg build
      
      * invalidate cache
      
      * separate benchmark requirements
      
      * comment
      
      * fixed mypy
      
      * fixed a test
      e348806b
  17. 27 Jan, 2021 1 commit
  18. 25 Jan, 2021 1 commit
    • Min Xu's avatar
      [test] cover python 3.7 to 3.9 on CPU (#303) · 8459634f
      Min Xu authored
      * [test] cover python 3.7 to 3.9 on CPU
      
      - covering common python versions on CPU tests
      - added doc build test
      
      * add doc build test
      
      * skipping failing tests on py39
      
      * catching doc build warnings
      
      * add doc build to py38 and py39
      
      * minor fix
      
      * fix doc build for adascale
      
      * removed dead code
      
      * fix the skipping
      
      * skip unit test for py39
      
      * add failing example
      
      * no more py39 skipping the tests
      8459634f
  19. 16 Jan, 2021 1 commit
  20. 15 Jan, 2021 1 commit
  21. 11 Jan, 2021 1 commit
  22. 05 Jan, 2021 1 commit
    • Benjamin Lefaudeux's avatar
      [fix] Flaky tests (#283) · 79365ee6
      Benjamin Lefaudeux authored
      * adding the pytest timeout plugin to properly root out hanging tests
      * removing redundant code, slightly more reasonable timeout, works on single cuda
      * finding the root bug for some of the cpu hangs, rpc init
      * propagating all the rpc init test changes to the pipe and model parallel tests
      79365ee6
  23. 30 Dec, 2020 1 commit
  24. 22 Dec, 2020 1 commit
  25. 30 Nov, 2020 1 commit
  26. 22 Nov, 2020 1 commit
  27. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  28. 20 Nov, 2020 1 commit
  29. 19 Nov, 2020 1 commit
  30. 06 Nov, 2020 1 commit
  31. 30 Oct, 2020 1 commit
  32. 29 Oct, 2020 1 commit
  33. 28 Oct, 2020 1 commit
  34. 23 Oct, 2020 1 commit
  35. 22 Oct, 2020 1 commit
  36. 21 Oct, 2020 1 commit