FSDP use _allgather_base and _reduce_scatter_base (#729)
* Update fully_sharded_data_parallel.py update fully_sharded_data_parallel to use _allgather_base * Update reduce_scatter_bucketer.py Use reduce_scatter_base * Update fully_sharded_data_parallel.py nonblocking gradient cpu copy, and nonblocking param rebulds * Update reduce_scatter_bucketer.py lints * Update fully_sharded_data_parallel.py * Update reduce_scatter_bucketer.py * Update reduce_scatter_bucketer.py * lints * linter, test fix * linter * LINTERgit add fairscale/utils/reduce_scatter_bucketer.pygit add fairscale/utils/reduce_scatter_bucketer.py * LINTERgit add tests/nn/data_parallel/test_fsdp_overlap.pygit add tests/nn/data_parallel/test_fsdp_overlap.py * Update test_fsdp_overlap.py * Update fairscale/utils/reduce_scatter_bucketer.py Co-authored-by:Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update fairscale/nn/data_parallel/fully_sharded_data_parallel.py Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> * Update reduce_scatter_bucketer.py * isort Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-185.ec2.internal> Co-authored-by:
Min Xu <24926999+min-xu-ai@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-77-164.ec2.internal>
Showing
Please register or sign in to comment