1. 28 Jun, 2021 1 commit
  2. 26 Jun, 2021 1 commit
  3. 25 Jun, 2021 2 commits
  4. 22 Jun, 2021 1 commit
    • Pavel Belevich's avatar
      Update torch to 1.9.0 release (#717) · 1cc4c837
      Pavel Belevich authored
      * Update torch to 1.9.0.dev20210614+cu102
      
      * Update config.yml
      
      * Update config.yml
      
      * Update setup.py
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      
      * Update config.yml
      1cc4c837
  5. 11 Jun, 2021 1 commit
    • anj-s's avatar
      [Offload][feature] Add auto shard functionality to remove requirement of... · cbeda830
      anj-s authored
      [Offload][feature] Add auto shard functionality to remove requirement of nn.Sequential models. (#695)
      
      * auto wrap functionality
      
      * lint and doc strings
      
      * fix lint errors
      
      * lint errors and version skips
      
      * remove mypy checking and add conditional import
      
      * another math.prod instance
      
      * another import fix
      
      * address comments
      
      * lint errors
      
      * address comments
      
      * fix lint errors
      
      * add placeholder nodes to tracker list
      cbeda830
  6. 27 May, 2021 1 commit
  7. 14 May, 2021 1 commit
  8. 07 May, 2021 1 commit
    • msbaines's avatar
      [feat] experimental.nn.SyncBatchNorm: initial commit (#662) · f0a40046
      msbaines authored
      * [feat] experimental.nn.SyncBatchNorm: initial commit
      
      Fast/simple re-implementation of SyncBatchNorm.
      
      When profiling SSL Vision, I was seeing a majority of cycles spent in
      SyncBatchNorm. With this change, I see a 10% to 20% speedup on the
      model I was profiling.
      
      When running benchmarks/experimental/sync_batchnorm.py on 8 x V100,
      I get a 6x speedup:
      
      <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
      Elapsed time is  0.08709120750427246
      Elapsed time is  0.12632274627685547
      Elapsed time is  0.14095258712768555
      Elapsed time is  0.16529417037963867
      Elapsed time is  0.1419970989227295
      Elapsed time is  0.15166854858398438
      Elapsed time is  0.12000870704650879
      Elapsed time is  0.17534875869750977
      <class 'torch.nn.modules.batchnorm.SyncBatchNorm'>
      Elapsed time is  2.5087168216705322
      Elapsed time is  2.497001886367798
      Elapsed time is  2.5204885005950928
      Elapsed time is  2.526789903640747
      Elapsed time is  2.5080230236053467
      Elapsed time is  2.524489641189575
      Elapsed time is  2.513214588165283
      Elapsed time is  2.5359973907470703
      <class 'fairscale.experimental.nn.sync_batchnorm.SyncBatchNorm'>
      Elapsed time is  0.4126114845275879
      Elapsed time is  0.39051294326782227
      Elapsed time is  0.40685415267944336
      Elapsed time is  0.4159870147705078
      Elapsed time is  0.42383885383605957
      Elapsed time is  0.4080159664154053
      Elapsed time is  0.41202712059020996
      Elapsed time is  0.42400121688842773
      f0a40046
  9. 04 May, 2021 1 commit
  10. 28 Apr, 2021 1 commit
    • Mehdi Mirzazadeh's avatar
      adding auto graph generation for distributed pipeline (#615) · bdc0581b
      Mehdi Mirzazadeh authored
      * adding auto graph generation for distributed pipeline
      
      * ignore trace.py for my for now, since it needs pytorch 1.8
      
      * fixing tests
      
      * simplifying graph api
      
      * remove unused debug utilities
      
      * use inspect to find argument lists
      
      * use sharded linear layer
      
      * flkae8
      
      * comment
      
      * polishing
      
      * polishing
      bdc0581b
  11. 15 Apr, 2021 1 commit
  12. 13 Apr, 2021 1 commit
  13. 31 Mar, 2021 2 commits
  14. 29 Mar, 2021 1 commit
  15. 28 Mar, 2021 1 commit
  16. 19 Mar, 2021 2 commits
  17. 04 Mar, 2021 1 commit
  18. 01 Mar, 2021 1 commit
    • Min Xu's avatar
      [chores]: make CI more efficient and update py39 env a bit (#447) · 5eb6b8c7
      Min Xu authored
      * [chores]: CI py39 on GPU and more efficiency
      
      * add test list files
      
      * fix
      
      * add test list files
      
      * split benchmark run into 2 runs
      
      * fix 1.8 version and balance benchmarks
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * recording tests
      
      * py39 install fix
      
      * test again
      
      * move tests
      
      * reorg tests
      
      * skip tests for torch 1.8 due to an upstream bug
      
      * removed __init__.py from tests since it confuses pytest
      
      * Revert "removed __init__.py from tests since it confuses pytest"
      
      This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0.
      
      * don't include __init__ in file list
      
      * notes on __init__.py and added missing ones
      
      * fixed mypy in a test file
      
      * balance test runtime
      
      * better pip install
      
      * balance more
      
      * pip fix
      
      * balance
      
      * balance more, all test should finish within 20m now
      
      * minor license update
      
      * trying cu102
      
      * more doc and addressed Ben's comments
      
      * debugging
      
      * debugging...
      5eb6b8c7
  19. 26 Feb, 2021 1 commit
    • anj-s's avatar
      [feature] Add support for OffloadModel to enable training large models on 1 GPU. (#432) · f7813d6d
      anj-s authored
      
      
      * clean start
      
      * removing per layer split strategy, probably not that useful indeed
      
      * initial transformer benchmark
      
      * hack, enable testing ViT + offload, python3 benchmarks/oss.py  --epochs 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224
      
      * proper cuda streams and device, something off in terms of mems consumption
      
      * minor, stashing
      
      * unit test fix
      
      * removing all the distributed parts
      
      * simpler test, needs debugging
      
      * working OOP, running a model which does not fit on the gpu memory
      
      * spring cleaning
      
      * removing the ill-advised optimizer bits, better keep that orthogonal
      
      * [offload] Add support for activation offloading + other changes (#367)
      
      * initial fwd/bwd commit
      
      * checkpoint work
      
      * modify shard loop
      
      * activation offloading and test to start with
      
      * fix lint errors
      
      * update comments
      
      * fix lint
      
      * remove unused var
      
      * remove commented out lines
      
      * modify name
      
      * remove break
      
      * remove profiler comments
      
      * avoid saving inputs
      
      * fix lint errors
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      
      * [offload] Add support for fp16 training (#374)
      
      * initial fwd/bwd commit
      
      * checkpoint work
      
      * modify shard loop
      
      * activation offloading and test to start with
      
      * fix lint errors
      
      * update comments
      
      * fix lint
      
      * remove unused var
      
      * remove commented out lines
      
      * modify name
      
      * remove break
      
      * remove profiler comments
      
      * add support for fp16
      
      * add unit tests
      
      * fix lint errors
      
      * fix test failure
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      
      * [offload] Add support for activation checkpointing for all layers. (#381)
      
      * initial fwd/bwd commit
      
      * checkpoint work
      
      * modify shard loop
      
      * activation offloading and test to start with
      
      * fix lint errors
      
      * update comments
      
      * fix lint
      
      * remove unused var
      
      * remove commented out lines
      
      * modify name
      
      * remove break
      
      * remove profiler comments
      
      * add support for fp16
      
      * add unit tests
      
      * fix lint errors
      
      * fix test failure
      
      * cp work, incorrect output dimensions still need to be fixed
      
      * fixed activation outputs
      
      * intermediate cp of work
      
      * add tests
      
      * fix lint errors
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      
      * add support for microbatches
      
      * revert benchmark config changes
      
      * add parametrization
      
      * fix lint errors and tests
      
      * skip test for 1.5
      
      * fix lint errors
      
      * skip test if there are no GPUs
      
      * fix lint errors
      
      * fix lint errors
      
      * move experimental to the fairscale repo
      
      * lint error fixes
      
      * modify test imports
      
      * lint error fixes
      
      * move offload files to the experimental directory
      
      * move tests and benchmarks to their forlder
      
      * fix mypy errors
      
      * cp intermediate working benchmarks
      
      * more changes
      
      * split benchmark configs
      
      * remove print statements
      
      * fix lint errors
      
      * remove unused print
      
      * stress testing
      
      * remove unused file
      
      * change param nae
      
      * lint fixes
      
      * move file to the right folder
      
      * offload_experimental
      
      * add doc string
      
      * add error message
      Co-authored-by: default avatarBenjamin Lefaudeux <benjamin.lefaudeux@gmail.com>
      Co-authored-by: default avatarBenjamin Lefaudeux <benjamin.lefaudeux@protonmail.com>
      Co-authored-by: default avatarAnjali Sridhar <anj@devfair0443.h2.fair>
      f7813d6d
  20. 24 Feb, 2021 1 commit