- 08 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 05 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
:(
-
- 04 Mar, 2021 1 commit
-
-
Benjamin Lefaudeux authored
-
- 03 Mar, 2021 1 commit
-
-
anj-s authored
[refactor] Use logging in place of print statements, remove unused functions and other minor refactoring changes. (#461) * fix pipe logging and other cleanups * more log/debug changes
-
- 01 Mar, 2021 1 commit
-
-
Min Xu authored
* [chores]: CI py39 on GPU and more efficiency * add test list files * fix * add test list files * split benchmark run into 2 runs * fix 1.8 version and balance benchmarks * fix * fix * fix * fix * recording tests * py39 install fix * test again * move tests * reorg tests * skip tests for torch 1.8 due to an upstream bug * removed __init__.py from tests since it confuses pytest * Revert "removed __init__.py from tests since it confuses pytest" This reverts commit 7e156ba33dfaa5ed052031780613ec0cb57a45b0. * don't include __init__ in file list * notes on __init__.py and added missing ones * fixed mypy in a test file * balance test runtime * better pip install * balance more * pip fix * balance * balance more, all test should finish within 20m now * minor license update * trying cu102 * more doc and addressed Ben's comments * debugging * debugging * better capture the errors * debugging * fix pyenv command * add universe repo * update to cuda 11 for 171 * add a test file, improved the checking script
-
- 26 Feb, 2021 1 commit
-
-
anj-s authored
* clean start * removing per layer split strategy, probably not that useful indeed * initial transformer benchmark * hack, enable testing ViT + offload, python3 benchmarks/oss.py --epochs 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224 * proper cuda streams and device, something off in terms of mems consumption * minor, stashing * unit test fix * removing all the distributed parts * simpler test, needs debugging * working OOP, running a model which does not fit on the gpu memory * spring cleaning * removing the ill-advised optimizer bits, better keep that orthogonal * [offload] Add support for activation offloading + other changes (#367) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * avoid saving inputs * fix lint errors Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for fp16 training (#374) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for activation checkpointing for all layers. (#381) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure * cp work, incorrect output dimensions still need to be fixed * fixed activation outputs * intermediate cp of work * add tests * fix lint errors Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * add support for microbatches * revert benchmark config changes * add parametrization * fix lint errors and tests * skip test for 1.5 * fix lint errors * skip test if there are no GPUs * fix lint errors * fix lint errors * move experimental to the fairscale repo * lint error fixes * modify test imports * lint error fixes * move offload files to the experimental directory * move tests and benchmarks to their forlder * fix mypy errors * cp intermediate working benchmarks * more changes * split benchmark configs * remove print statements * fix lint errors * remove unused print * stress testing * remove unused file * change param nae * lint fixes * move file to the right folder * offload_experimental * add doc string * add error message Co-authored-by:
Benjamin Lefaudeux <benjamin.lefaudeux@gmail.com> Co-authored-by:
Benjamin Lefaudeux <benjamin.lefaudeux@protonmail.com> Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 24 Feb, 2021 2 commits
-
-
anj-s authored
* refactor experimental file locations * refactor fix * disable test temporarily * lint error fix * make the change in the right file * fix lint errors * skip failing tests Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
anj-s authored
-
- 23 Feb, 2021 1 commit
-
-
anj-s authored
* move experimental to the fairscale repo * lint error fixes * modify test imports * lint error fixes * lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 04 Feb, 2021 1 commit
-
-
msbaines authored
-
- 03 Feb, 2021 2 commits
-
-
Benjamin Lefaudeux authored
* restoring the regression test, adding a test of the for_each optims * fix the regression test on circleci * removing unused flags
-
anj-s authored
* mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 29 Jan, 2021 1 commit
-
-
msbaines authored
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
-
- 25 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove unused class * add seq len and batch size to the config * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes * merge latest changes * lint errors * address PR comments * initial refactoring * lint fixes * fix lint errors * update comment Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 23 Jan, 2021 1 commit
-
-
Siddharth Goyal authored
* Add AMPnet implementation (clean version) * Move ampnet to experimental * Move stuff around pipeline * Address review comments and fix pre-commit errors * Refactor and modify delegate functionality * Modify header in pipe.py
-
- 21 Jan, 2021 2 commits
-
-
Benjamin Lefaudeux authored
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove unused class * add seq len and batch size to the config * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes * merge latest changes * lint errors * address PR comments Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 19 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 04 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 30 Dec, 2020 2 commits
-
-
anj-s authored
[refactor] Remove unused variables, add configuration objects and basic cleanup for pipe benchmarks. (#252) * [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * rename variable Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
Benjamin Lefaudeux authored
* removing a dead call since ShardedDDP, small speedup * unrelated, but filling in the changelog * another nit
-
- 16 Dec, 2020 1 commit
-
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
- 01 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 22 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* testing median and MAD * synchronize on kernels to make sure that we're measuring the actual completion time * adjusting the circleci threshold, not that the speed has regressed but because we measure proper cuda execution time
-
- 21 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* rewrite using autograd and Variable execution queue to make the reduce automatic * share buckets with OSS to remove duplication * some speed still likely on the table since the speed vs. bucketing does not match expectations, could be a follow up
-
- 19 Nov, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* reverting a change which slipped in #188
-
Yuanyuan (Ana) Shen authored
* Add CPU support for pipe.py benchmarks, CUDA-free
-
- 18 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a shard-aware GradScaler wrap, credits to Sean Naren for the idea * adding stubs & explanations in the documentation
-
- 16 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
add a clip gradients util, equivalent to torch's but aware of the sharded states. Add a corresponding unit test
-
- 12 Nov, 2020 1 commit
-
-
Yuanyuan (Ana) Shen authored
* now works on a machine without cuda, easier to debug and quick test
-
- 10 Nov, 2020 1 commit
-
-
Tom Birch authored
Adds support for: * Reused layers (e.g. for weight sharing) * Lazily-constructed layers * Single-process control via PipeRPCWrapper * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive Also added examples for multi-process and PipeRPCWrapper
-
- 06 Nov, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* oss benchmark: add an --amp option * add a circleCI test
-
- 28 Oct, 2020 1 commit
-
-
msbaines authored
-
- 23 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Some ease of use in the benchmark tool, add a debug option
-
- 21 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* switching to MNIST * updating the reference values, should be good to go * download dataset once for all processes
-
- 20 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* Minor, ease of life to debug and makes it possible to test a host of models with the same code
-
- 17 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* adding a cpu option * adjust the reference loss
-
- 14 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 10 Oct, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* bugfix * adjust default non-regression loss, not all_reduced now
-