1. 21 Jan, 2021 1 commit
  2. 30 Dec, 2020 1 commit
  3. 22 Nov, 2020 1 commit
  4. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  5. 18 Nov, 2020 1 commit
  6. 16 Nov, 2020 1 commit
  7. 12 Nov, 2020 1 commit
  8. 06 Nov, 2020 1 commit
  9. 23 Oct, 2020 1 commit
  10. 21 Oct, 2020 1 commit
  11. 20 Oct, 2020 1 commit
  12. 17 Oct, 2020 1 commit
  13. 14 Oct, 2020 1 commit
  14. 10 Oct, 2020 1 commit
  15. 09 Oct, 2020 1 commit
  16. 06 Oct, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS/SDP : bucketing (#122) · 341d8b2b
      Benjamin Lefaudeux authored
      Same bucketing strategy for OSS and SDP:
      sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
      341d8b2b
  17. 29 Sep, 2020 1 commit
  18. 24 Sep, 2020 1 commit
  19. 22 Sep, 2020 2 commits
  20. 17 Sep, 2020 1 commit
  21. 16 Sep, 2020 1 commit
  22. 09 Sep, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS flatten state dict (#65) · 4f597233
      Benjamin Lefaudeux authored
      Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
      4f597233
  23. 03 Sep, 2020 2 commits
    • Benjamin Lefaudeux's avatar
      [feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0
      Benjamin Lefaudeux authored
      * Aligning the optimizer state dict with what PyTorch expects
      
      * Adding a check on the dict keys, ensure that `state` and `param_groups` are there
      
      * after installing the specific isort, black and all, one liner to please the linter..
      
      * Adding some measurement of the memory consumption while training + checkpointing
      
      * mandatory lintfix commit
      
      * brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row
      
      * move reset stats call, hotfix
      
      * move the optimizer to rmsprop, more stateful and still used in CV
      
      * trying to figure out a sigsev in circleci
      ee38e1e0
    • Benjamin Lefaudeux's avatar
      [fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea
      Benjamin Lefaudeux authored
      * Aligning the optimizer state dict with what PyTorch expects
      
      * Adding a check on the dict keys, ensure that `state` and `param_groups` are there
      
      * after installing the specific isort, black and all, one liner to please the linter..
      1d1d15ea
  24. 21 Aug, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] Simple macro OSS benchmark (#47) · 46c3776b
      Benjamin Lefaudeux authored
      
      
      * initial commit, dummy training loop, pure pytorch but not DDP
      
      * probably slightly broken, but rough DDP benchmark run
      
      * adding the torchvision requirement for testing
      
      * brainfart
      
      * reduce the loss, do something slightly distributed
      
      * Some cleanup, distributing the training on two GPUs
      
      * some cleanup + adding a vanilla run, still not good to go
      
      * less silly defaults, gtg for a start I think
      
      * smaller batch to fit the smaller gpus used in the circleci rigs
      
      * Adding some options for the benchmark, and regression testing
      
      * [test] set torch seed for Adam tests (#49)
      
      Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      
      * linting, I really need to automate this isort insanity
      Co-authored-by: default avatarJun Ru Anderson <33384298+andersonic@users.noreply.github.com>
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      46c3776b