change default FSDP strategy to grad_optim (ZERO2)

Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522 Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models. `FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now Reviewed By: tglik Differential Revision: D44657184 fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c

change default FSDP strategy to grad_optim (ZERO2)
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522 Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models. `FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now Reviewed By: tglik Differential Revision: D44657184 fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c
35affd74 · Anthony Chen · Facebook GitHub Bot · ed671e34 · 35affd74
Commit 35affd74 authored Apr 04, 2023 by Anthony Chen Committed by Facebook GitHub Bot Apr 04, 2023
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

d2go/trainer/fsdp.py d2go/trainer/fsdp.py +1 -1

No files found.
--- a/d2go/trainer/fsdp.py
+++ b/d2go/trainer/fsdp.py
@@ -41,7 +41,7 @@ D2GO_FSDP_WRAP_POLICY_REGISTRY = Registry("D2GO_FSDP_WRAP_POLICY_REGISTRY")
 def add_fsdp_configs(_C: CN):
    _C.FSDP = CN()
-    _C.FSDP.ALGORITHM = ""  # 'grad_optim' or 'full'
+    _C.FSDP.ALGORITHM = "grad_optim"  # 'grad_optim', 'full', 'hybrid', 'hybrid_zero2'
    # Configs for fully sharded data parallel (fsdp)
    # Check out https://pytorch.org/docs/stable/fsdp.html