"src/git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "8e53cd959e535f82d49c9719d71269b589fcef7b"
change default FSDP strategy to grad_optim (ZERO2)
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522 Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models. `FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now Reviewed By: tglik Differential Revision: D44657184 fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c
Showing
Please register or sign in to comment