• Min Xu's avatar
    [feat] save memory by using bucket buffer only in backward (#633) · a5594032
    Min Xu authored
    
    
    * [feat] save memory by using bucket buffer only in backward
    
    - this fixes bug #627
    - added documentation to clarify the buffer's cost and speed/memory
      tradeoff
    - added setup/teardown calls so that the buffer is only allocated
      during the backward pass, saving more memory for forward and stepping
      so that they can be used for things like activations.
    - added a unit test that assert the memory is in range.
    
    Comparing with DDP:
    
      1. buffer size scales with # of FSDP not model size
      2. buffer is only allocated during backward
      3. buffer is used for small tensors only to reduce overhead
      4. overlapping of compute-reduction is very different
    
    * add PR number to changelog
    
    * filled in with memory number on 1.9
    
    * addressed comments
    
    * update comments
    
    * fix for 1.6
    
    * add a todo
    Co-authored-by: default avatarMin Xu <min.xu@acm.org>
    a5594032
ci_test_list_1.txt 295 Bytes