1. 24 Oct, 2021 1 commit
  2. 21 Oct, 2021 1 commit
  3. 08 May, 2021 1 commit
  4. 08 Mar, 2021 1 commit
  5. 04 Mar, 2021 1 commit
  6. 01 Mar, 2021 1 commit
    • Min Xu's avatar
      [chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7
      Min Xu authored
      * [chores]: CI py39 on GPU and more efficiency
      
      * add test list files
      
      * fix
      
      * add test list files
      
      * split benchmark run into 2 runs
      
      * fix 1.8 version and balance benchmarks
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * recording tests
      
      * py39 install fix
      
      * test again
      
      * move tests
      
      * reorg tests
      
      * skip tests for torch 1.8 due to an upstream bug
      
      * removed __init__.py from tests since it confuses pytest
      
      * Revert "removed __init__.py from tests since it confuses pytest"
      
      This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.
      
      * don't include __init__ in file list
      
      * notes on __init__.py and added missing ones
      
      * fixed mypy in a test file
      
      * balance test runtime
      
      * better pip install
      
      * balance more
      
      * pip fix
      
      * balance
      
      * balance more, all test should finish within 20m now
      
      * minor license update
      
      * trying cu102
      
      * more doc and addressed Ben's comments
      
      * debugging
      
      * debugging...
      5eb6b8c7
  7. 03 Feb, 2021 1 commit
  8. 25 Jan, 2021 1 commit
    • anj-s's avatar
      [refactor] Add benchmark config object and validation function (#314) · 331aed2c
      anj-s authored
      
      
      * [refactor]Remove unused variables and refactor common configurations
      
      * move helper function to call site
      
      * fixed lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * fix lint errors
      
      * fix import order
      
      * format files
      
      * remove unused imports
      
      * fix lint errors
      
      * fix lint errors
      
      * refactor common utilities
      
      * address PR comments
      
      * sorted imports
      
      * add space
      
      * modify comment
      
      * added doc strings and addressed PR comments.
      
      * addressed PR comments
      
      * added another comment to clarify.
      
      * fixing lint errors
      
      * addressed PR comments
      
      * addressed PR comments
      
      * fixed typos
      
      * initialize var
      
      * rename seq_pred to lm
      
      * fix lint errors
      
      * move datasets and models into separate folders
      
      * add the folders created
      
      * fix lint errors
      
      * create golden config to stats mapping
      
      * add common batching for both synthetic and real data
      
      * fixed lint errors
      
      * enable real pipe benchmakrs with new golden data
      
      * reduce seq len to avoid OOM
      
      * updated golden data
      
      * add logging
      
      * add golden data
      
      * add golden data
      
      * fix lint errors
      
      * add doc string
      
      * remove unused class
      
      * add seq len and batch size to the config
      
      * remove commented out line
      
      * address comments
      
      * rename imports
      
      * refactor common logic in dataloaders
      
      * add golden configs
      
      * lint changes
      
      * merge latest changes
      
      * lint errors
      
      * address PR comments
      
      * initial refactoring
      
      * lint fixes
      
      * fix lint errors
      
      * update comment
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      331aed2c
  9. 21 Jan, 2021 1 commit
  10. 30 Dec, 2020 1 commit
  11. 22 Nov, 2020 1 commit
  12. 21 Nov, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] ShardedDataParallel with autoreduce (#157) · ad933b34
      Benjamin Lefaudeux authored
      * rewrite using autograd and Variable execution queue to make the reduce automatic
      * share buckets with OSS to remove duplication
      * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
      ad933b34
  13. 18 Nov, 2020 1 commit
  14. 16 Nov, 2020 1 commit
  15. 12 Nov, 2020 1 commit
  16. 06 Nov, 2020 1 commit
  17. 23 Oct, 2020 1 commit
  18. 21 Oct, 2020 1 commit
  19. 20 Oct, 2020 1 commit
  20. 17 Oct, 2020 1 commit
  21. 14 Oct, 2020 1 commit
  22. 10 Oct, 2020 1 commit
  23. 09 Oct, 2020 1 commit
  24. 06 Oct, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS/SDP : bucketing (#122) · 341d8b2b
      Benjamin Lefaudeux authored
      Same bucketing strategy for OSS and SDP:
      sort everything ahead of time, per rank and per size, smaller tensors first. Bucket the smallest elements in a fixed buffer, send async, then send all the others async, and get back to the bucket. Once done then scatter the contents if needed
      341d8b2b
  25. 29 Sep, 2020 1 commit
  26. 24 Sep, 2020 1 commit
  27. 22 Sep, 2020 2 commits
  28. 17 Sep, 2020 1 commit
  29. 16 Sep, 2020 1 commit
  30. 09 Sep, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] OSS flatten state dict (#65) · 4f597233
      Benjamin Lefaudeux authored
      Changes the structure of the returned state dict with respect to the param_groups to make it closer to what a vanilla optimizer would return (un-shard them). Shard again when loading
      4f597233
  31. 03 Sep, 2020 2 commits
    • Benjamin Lefaudeux's avatar
      [feat] Add a memory usage regression test to the OSS benchmark (#62) · ee38e1e0
      Benjamin Lefaudeux authored
      * Aligning the optimizer state dict with what PyTorch expects
      
      * Adding a check on the dict keys, ensure that `state` and `param_groups` are there
      
      * after installing the specific isort, black and all, one liner to please the linter..
      
      * Adding some measurement of the memory consumption while training + checkpointing
      
      * mandatory lintfix commit
      
      * brainfart, reset the memory use counter at the beginning of the training in case two of them are run in a row
      
      * move reset stats call, hotfix
      
      * move the optimizer to rmsprop, more stateful and still used in CV
      
      * trying to figure out a sigsev in circleci
      ee38e1e0
    • Benjamin Lefaudeux's avatar
      [fix] OSS pytorch-compliant state dict (#61) · 1d1d15ea
      Benjamin Lefaudeux authored
      * Aligning the optimizer state dict with what PyTorch expects
      
      * Adding a check on the dict keys, ensure that `state` and `param_groups` are there
      
      * after installing the specific isort, black and all, one liner to please the linter..
      1d1d15ea
  32. 21 Aug, 2020 1 commit
    • Benjamin Lefaudeux's avatar
      [feat] Simple macro OSS benchmark (#47) · 46c3776b
      Benjamin Lefaudeux authored
      
      
      * initial commit, dummy training loop, pure pytorch but not DDP
      
      * probably slightly broken, but rough DDP benchmark run
      
      * adding the torchvision requirement for testing
      
      * brainfart
      
      * reduce the loss, do something slightly distributed
      
      * Some cleanup, distributing the training on two GPUs
      
      * some cleanup + adding a vanilla run, still not good to go
      
      * less silly defaults, gtg for a start I think
      
      * smaller batch to fit the smaller gpus used in the circleci rigs
      
      * Adding some options for the benchmark, and regression testing
      
      * [test] set torch seed for Adam tests (#49)
      
      Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      
      * linting, I really need to automate this isort insanity
      Co-authored-by: default avatarJun Ru Anderson <33384298+andersonic@users.noreply.github.com>
      Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
      46c3776b