- 12 Jun, 2022 1 commit
-
-
Crutcher Dunnavant authored
-
- 14 Feb, 2022 1 commit
-
-
Min Xu authored
* update pytest versions * [test] test related changes - upgrade to newer pytorch versions - added function to make test more deterministic on A100 and TF32 - fixed some tests so that they are correctly skipped on a single GPU system * more fixes * formatting overly long lines * format * better test without trigger a warning * fix an optim state bug with newer pytorch - adam optimizer seems to return "step" as a singleton tensor now in the nightly build - this fixes it assumeing non-tensor value can still be loaded back by the optimizer * improve oss.py - use min_loss for regression checking is a bit more reliable - also increased the num epochs from 10 to 12 * small oss.py fix * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:Min Xu <min.xu.public@gmail.com>
-
- 26 Jun, 2021 1 commit
-
-
Pavel Belevich authored
-
- 13 May, 2021 1 commit
-
-
Min Xu authored
* [fix] add and use get_process_group_cached - This commit makes FSDP avoid making too many process groups by default - Extra process group is bad for GPU memory and init time * add changelog * lint * note on speed * add better assert output test seems to be flaky: https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps * update test reference memory values - With cached process groups, the memory is reduced as reported by pytorch as well (due to bucket buffer memory for the reduction buffer) - The effect on memory is actually more on the SMI memory, which is not reported by pytorch and checked by this test. * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py * Update CHANGELOG.md * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * Update fairscale/utils/parallel.py * improved changelog * better handling of underscores in the md file Co-authored-by:
Min Xu <min.xu@acm.org>
-
- 08 May, 2021 1 commit
-
-
Sam Shleifer authored
-
- 02 Apr, 2021 1 commit
-
-
msbaines authored
NCCL all_to_all is now supported in PyTorch (since v1.8.0) Fixes: #548
-
- 11 Jan, 2021 1 commit
-
-
Benjamin Lefaudeux authored
* tentatively fixing the cpu version of circleci jobs, now pipe tests are the last ones standing * fixing oss backcompat, trying to fix rpc in old pytorch also * fixing the file based init in torch 1.5
-
- 28 Dec, 2020 1 commit
-
-
Benjamin Lefaudeux authored
* file based dist init * nicer handling of broken world sizes vs. number of available GPUs, do not break but warn out
-
- 11 Nov, 2020 2 commits
- 10 Nov, 2020 1 commit
-
-
Tom Birch authored
Adds support for: * Reused layers (e.g. for weight sharing) * Lazily-constructed layers * Single-process control via PipeRPCWrapper * PipelineStyle.AsyncScheudle, which lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive Also added examples for multi-process and PipeRPCWrapper
-
- 23 Oct, 2020 1 commit
-
-
msbaines authored
-
- 21 Oct, 2020 1 commit
-
-
msbaines authored
-
- 17 Oct, 2020 1 commit
-
-
msbaines authored
-
- 16 Oct, 2020 2 commits
- 14 Oct, 2020 1 commit
-
-
msbaines authored
-
- 08 Oct, 2020 1 commit
-
-
msbaines authored
Currently only implemented for a single process and expert.
-
- 05 Oct, 2020 1 commit
-
-
msbaines authored
-
- 02 Oct, 2020 1 commit
-
-
msbaines authored
-