change default FSDP strategy to grad_optim (ZERO2)
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522 Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models. `FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now Reviewed By: tglik Differential Revision: D44657184 fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c
Showing
Please register or sign in to comment