1. 06 Apr, 2021 1 commit
  2. 05 Apr, 2021 1 commit
  3. 30 Mar, 2021 1 commit
  4. 25 Mar, 2021 1 commit
  5. 18 Mar, 2021 2 commits
  6. 17 Mar, 2021 1 commit
  7. 09 Mar, 2021 1 commit
  8. 05 Mar, 2021 1 commit
  9. 04 Mar, 2021 1 commit
    • Min Xu's avatar
      [test] AdaScale & SDP/FSDP (#468) · efed9cee
      Min Xu authored
      - cover them in terms of code path only
      - numerically, AdaScale is different on SDP/FSDP than DDP, mainly
        due to partial view of the gradients.
      - this doesn't mean it is definitely not useful but it is yet to
        be validated.
      - not going to spend too much time until we have a real use case.
      efed9cee
  10. 25 Feb, 2021 1 commit
  11. 23 Feb, 2021 2 commits
  12. 19 Feb, 2021 1 commit
  13. 18 Feb, 2021 2 commits
  14. 17 Feb, 2021 1 commit
  15. 12 Feb, 2021 1 commit
  16. 05 Feb, 2021 1 commit
  17. 03 Feb, 2021 1 commit
  18. 29 Jan, 2021 1 commit
  19. 21 Jan, 2021 1 commit
  20. 15 Jan, 2021 1 commit
  21. 08 Jan, 2021 2 commits
  22. 02 Jan, 2021 1 commit
  23. 30 Dec, 2020 1 commit
  24. 19 Dec, 2020 1 commit
  25. 16 Dec, 2020 2 commits
  26. 15 Dec, 2020 1 commit
  27. 04 Dec, 2020 1 commit
  28. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  29. 21 Oct, 2020 1 commit
    • Min Xu's avatar
      [fix] fixing adascale all_reduce (#155) · 6802ad49
      Min Xu authored
      - Aurick noticed this bug and I ran into it yesterday
      - after the fix, our cifar training shows same gain values from
        different replics now:
      
      ```
      20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3512124098087777
      20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3512124098087777
      20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000600 fwd 0:00:00.003678 loss 0:00:00.000086 bwd 0:00:00.314158 update 0:00:00.002132 rest 0:00:00.000399
      20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000643 fwd 0:00:00.003460 loss 0:00:00.000084 bwd 0:00:00.314678 update 0:00:00.002001 rest 0:00:00.000408
      20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.3514997779980324
      20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.3514997779980324
      20-Oct-20 16:00:19 - DEBUG - rank1 - timing: data 0:00:00.000732 fwd 0:00:00.003689 loss 0:00:00.000086 bwd 0:00:00.314176 update 0:00:00.002146 rest 0:00:00.000397
      20-Oct-20 16:00:19 - DEBUG - rank0 - timing: data 0:00:00.000646 fwd 0:00:00.003542 loss 0:00:00.000089 bwd 0:00:00.314549 update 0:00:00.001956 rest 0:00:00.000392
      20-Oct-20 16:00:19 - DEBUG - rank1 - scale 2, gain ratio 1.352149646693932
      20-Oct-20 16:00:19 - DEBUG - rank0 - scale 2, gain ratio 1.352149646693932
      ```
      6802ad49
  30. 06 Oct, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS/SDP : bucketing (#122) · 341d8b2b
      Benjamin Lefaudeux authored
      Same bucketing strategy for OSS and SDP:
      sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
      341d8b2b
  31. 29 Sep, 2020 1 commit
  32. 17 Sep, 2020 1 commit
  33. 28 Aug, 2020 1 commit
  34. 06 Aug, 2020 1 commit