tests/nn/moe/test_moe_layer.py · bde4bac5fe3f6b040ac6d75e3bd631be7f504c27 · OpenDAS / fairscale

[fix] add and use get_process_group_cached (#678) · bde4bac5

Min Xu authored May 12, 2021

* [fix] add and use get_process_group_cached

- This commit makes FSDP avoid making too many process groups by default
- Extra process group is bad for GPU memory and init time

* add changelog

* lint

* note on speed

* add better assert output

test seems to be flaky:
https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2957/workflows/383c9f9f-f1a5-461c-8c41-e2e28ece037b/jobs/26783/steps



* update test reference memory values

- With cached process groups, the memory is reduced as reported by
pytorch as well (due to bucket buffer memory for the reduction buffer)
- The effect on memory is actually more on the SMI memory, which is not
reported by pytorch and checked by this test.

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py

* Update CHANGELOG.md

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* Update fairscale/utils/parallel.py

* improved changelog

* better handling of underscores in the md file
Co-authored-by: Min Xu <min.xu@acm.org>

bde4bac5

test_moe_layer.py 6.9 KB

Replace test_moe_layer.py