Commits · 14d1f78ce8dbb3c922f75d22e92646f909dc32b5 · OpenDAS / fairscale

28 Apr, 2021 1 commit

[feat] save memory by using bucket buffer only in backward (#633) · a5594032

Min Xu authored Apr 27, 2021



* [feat] save memory by using bucket buffer only in backward

- this fixes bug #627
- added documentation to clarify the buffer's cost and speed/memory
  tradeoff
- added setup/teardown calls so that the buffer is only allocated
  during the backward pass, saving more memory for forward and stepping
  so that they can be used for things like activations.
- added a unit test that assert the memory is in range.

Comparing with DDP:

  1. buffer size scales with # of FSDP not model size
  2. buffer is only allocated during backward
  3. buffer is used for small tensors only to reduce overhead
  4. overlapping of compute-reduction is very different

* add PR number to changelog

* filled in with memory number on 1.9

* addressed comments

* update comments

* fix for 1.6

* add a todo
Co-authored-by: Min Xu <min.xu@acm.org>

a5594032