"tests/nn/git@developer.sourcefind.cn:OpenDAS/fairscale.git" did not exist on "ed7ca766436841a9372f551cc836e3620cf1aa89"
[feat] save memory by using bucket buffer only in backward (#633)
* [feat] save memory by using bucket buffer only in backward
- this fixes bug #627
- added documentation to clarify the buffer's cost and speed/memory
tradeoff
- added setup/teardown calls so that the buffer is only allocated
during the backward pass, saving more memory for forward and stepping
so that they can be used for things like activations.
- added a unit test that assert the memory is in range.
Comparing with DDP:
1. buffer size scales with # of FSDP not model size
2. buffer is only allocated during backward
3. buffer is used for small tensors only to reduce overhead
4. overlapping of compute-reduction is very different
* add PR number to changelog
* filled in with memory number on 1.9
* addressed comments
* update comments
* fix for 1.6
* add a todo
Co-authored-by:
Min Xu <min.xu@acm.org>
Showing
Please register or sign in to comment