• Min Xu's avatar
    [fix] FSDP forward pass overlap between compute and all-gather (#671) · 8a42a8e3
    Min Xu authored
    
    
    * [fix] FSDP forward pass overlap between compute and all-gather
    
    - much thanks for @cyanguwa for report and @QuentinDuval for debugging it
    - a new unit test is added to check for this and ensure we detect
      issue with overlapping and cpu/gpu blocking wait calls
    
    * fix
    
    * fix
    
    * fix
    
    * better assertion outputs
    
    * fix format and tune all_gather mb for CI
    
    * more tuning with non_flatten
    
    * undo an accidental change
    
    * tuning all gather mb and del model
    
    * Update + fix overlapping test to use patched all_gather w/ delay (#672)
    
    * fixing get_cycles_per_ms
    
    * add get_smi_memory
    
    * update the docstring
    Co-authored-by: default avatarMin Xu <min.xu@acm.org>
    Co-authored-by: default avatarMyle Ott <myleott@fb.com>
    8a42a8e3
ci_test_list_2.txt 1.56 KB