- 17 Mar, 2021 1 commit
-
-
anj-s authored
* debugging statements * fix index inputs and streams * fix lint errors * remove print * lint errors * address comments * lint error Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 09 Mar, 2021 1 commit
-
-
anj-s authored
* smal fix, remove unused flags * remove usused flag * add back max_batch flag * adding back lazy_construction * adding back lazy_construction * add missing device arg Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 03 Mar, 2021 1 commit
-
-
anj-s authored
[refactor] Use logging in place of print statements, remove unused functions and other minor refactoring changes. (#461) * fix pipe logging and other cleanups * more log/debug changes
-
- 26 Feb, 2021 1 commit
-
-
anj-s authored
* clean start * removing per layer split strategy, probably not that useful indeed * initial transformer benchmark * hack, enable testing ViT + offload, python3 benchmarks/oss.py --epochs 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224 * proper cuda streams and device, something off in terms of mems consumption * minor, stashing * unit test fix * removing all the distributed parts * simpler test, needs debugging * working OOP, running a model which does not fit on the gpu memory * spring cleaning * removing the ill-advised optimizer bits, better keep that orthogonal * [offload] Add support for activation offloading + other changes (#367) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * avoid saving inputs * fix lint errors Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for fp16 training (#374) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * [offload] Add support for activation checkpointing for all layers. (#381) * initial fwd/bwd commit * checkpoint work * modify shard loop * activation offloading and test to start with * fix lint errors * update comments * fix lint * remove unused var * remove commented out lines * modify name * remove break * remove profiler comments * add support for fp16 * add unit tests * fix lint errors * fix test failure * cp work, incorrect output dimensions still need to be fixed * fixed activation outputs * intermediate cp of work * add tests * fix lint errors Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair> * add support for microbatches * revert benchmark config changes * add parametrization * fix lint errors and tests * skip test for 1.5 * fix lint errors * skip test if there are no GPUs * fix lint errors * fix lint errors * move experimental to the fairscale repo * lint error fixes * modify test imports * lint error fixes * move offload files to the experimental directory * move tests and benchmarks to their forlder * fix mypy errors * cp intermediate working benchmarks * more changes * split benchmark configs * remove print statements * fix lint errors * remove unused print * stress testing * remove unused file * change param nae * lint fixes * move file to the right folder * offload_experimental * add doc string * add error message Co-authored-by:
Benjamin Lefaudeux <benjamin.lefaudeux@gmail.com> Co-authored-by:
Benjamin Lefaudeux <benjamin.lefaudeux@protonmail.com> Co-authored-by:
Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 24 Feb, 2021 1 commit
-
-
anj-s authored
-
- 04 Feb, 2021 1 commit
-
-
msbaines authored
-
- 03 Feb, 2021 1 commit
-
-
anj-s authored
* mp cleanup * round of multiprocess refactoring * test golden run * print cuda stats * fix lint errors * enable multiprocess pipe benchmarks * set world size to be available gpus * more changes * use synthetic loaders for intermediate pipeline stages * merged master * fix for the devices property * dataloader fix * modify rank check * print wps stats * enable verification * fix logging * fix flag name * fix flag name * check for rank * fix indent * pass args * pass args * modify golden data * remove unused print messsage * fix lint errors * add comments * fix benchmarks Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 29 Jan, 2021 1 commit
-
-
msbaines authored
-
- 27 Jan, 2021 1 commit
-
-
msbaines authored
-
- 21 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove unused class * add seq len and batch size to the config * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes * merge latest changes * lint errors * address PR comments Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 19 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors * move datasets and models into separate folders * add the folders created * fix lint errors * create golden config to stats mapping * add common batching for both synthetic and real data * fixed lint errors * enable real pipe benchmakrs with new golden data * reduce seq len to avoid OOM * updated golden data * add logging * add golden data * add golden data * fix lint errors * add doc string * remove commented out line * address comments * rename imports * refactor common logic in dataloaders * add golden configs * lint changes Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 04 Jan, 2021 1 commit
-
-
anj-s authored
* [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * fix lint errors * refactor common utilities * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * addressed PR comments * addressed PR comments * fixed typos * initialize var * rename seq_pred to lm * fix lint errors Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 30 Dec, 2020 1 commit
-
-
anj-s authored
[refactor] Remove unused variables, add configuration objects and basic cleanup for pipe benchmarks. (#252) * [refactor]Remove unused variables and refactor common configurations * move helper function to call site * fixed lint errors * fix lint errors * fix lint errors * fix lint errors * fix import order * format files * remove unused imports * fix lint errors * address PR comments * sorted imports * add space * modify comment * added doc strings and addressed PR comments. * addressed PR comments * added another comment to clarify. * fixing lint errors * rename variable Co-authored-by:Anjali Sridhar <anj@devfair0443.h2.fair>
-
- 16 Dec, 2020 1 commit
-
-
jessijzhao authored
* [feat] add CPU support to tutorials in examples * now works on a machine without cuda * fixes some minor typos * [cleanup] factorize tutorials in examples * collects duplicate code across tutorials in helpers.py * [fix] getData in tutorials now returns iterable
-
- 01 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
-
- 19 Nov, 2020 2 commits
-
-
Benjamin Lefaudeux authored
* reverting a change which slipped in #188
-
Yuanyuan (Ana) Shen authored
* Add CPU support for pipe.py benchmarks, CUDA-free
-
- 10 Nov, 2020 1 commit
-
-
Tom Birch authored
Adds support for: * Reused layers (e.g. for weight sharing) * Lazily-constructed layers * Single-process control via PipeRPCWrapper * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive Also added examples for multi-process and PipeRPCWrapper
-
- 28 Oct, 2020 1 commit
-
-
msbaines authored
-
- 17 Sep, 2020 1 commit
-
-
Tom Birch authored
Adds support for distributing pipeline stages across multiple processes (and therefore multiple machines) * Adds a style argument to the Pipe constructor, defaulting to PipelineStyle.SingleProcess, but also supporting PipelineStyle.MultiProcess * Added support for lazy construction of modules (see lazy_construction for an example) * Added two implementations of inter-process communication: one based on rpc with globally visible queues, one based on send/recv * Copied all the relevant tests from tests/pipe to tests/pipe_process and modified them to exercise PipelineStyle.MultiProcess
-
- 03 Sep, 2020 1 commit
-
-
Jun Ru Anderson authored
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 28 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
* specify chunks for pipe/transformer benchmark Set chunks to be equal to len(balance) for pipe/transformer benchmark. Will update words per second and memory usage checks in next commit (must test on CircleCI to find appropriate values) * change benchmark words per second and memory usage Did six runs for words-per-second, with results: 9144.40, 9163.91, 9993.01, 9082.82, 9155.09, 9000.67 Peak allocated bytes per device (which does not change between runs) were 193206272, 645632, 562688, 92688384 for devices 0, 1, 2 and 3, respectively * increase batch size batch size was small enough that the GPU's computing power was not the bottleneck, slowing training and specifically making more chunks slower. Increasing batch size has therefore increased training speed * update benchmark numbers ran six times, with wps 36917.44, 36797.65, 37006.03, 36872.84, 37129.31, 37003.31 and peak allocated bytes 4061909504, 4050944, 10427392, 2031824896 for devices 0,1,2 and 3 respectively. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 22 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 21 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 18 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Allow training with optimizer state in fp16. Use an enum to select from full-precision, mixed precision, memory efficient mixed precision and pure fp16. Improve clarity of testing code Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 14 Aug, 2020 1 commit
-
-
Jun Ru Anderson authored
Add support for mixed-precision (half precision params, full precision gradients) and memory-efficient (half precision params and half precision gradients) training with Adam Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
- 31 Jul, 2020 3 commits
-
-
Jun Ru Anderson authored
Add FusedAdam, update benchmark and add tests. Co-authored-by:Jun Ru Anderson <andersonic@fb.com>
-
Jun Ru Anderson authored
-
Jun Ru Anderson authored
-
- 08 Jul, 2020 1 commit
-
-
Mandeep Singh Baines authored
-