• four4fish's avatar
    FullyShardedDataParallel: only return full state dict on rank 0 (#885) · d3417ceb
    four4fish authored
    * FullyShardedDataParallel: only return full state dict on rank 0
    
    * Add flag and make rank 0 only optional
    
    * Add tests
    
    * Add docs
    
    * address comments
    
    * update comments
    
    * update torch nightly version
    
    * update torchvision number for torch nightly dependence
    
    * add changelog
    
    * Update CHANGELOG.md
    
    * Update CHANGELOG.md
    d3417ceb
test_fsdp.py 34.9 KB