You need to sign in or sign up before continuing.
Commit 35affd74 authored by Anthony Chen's avatar Anthony Chen Committed by Facebook GitHub Bot
Browse files

change default FSDP strategy to grad_optim (ZERO2)

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522

Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models.

`FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now

Reviewed By: tglik

Differential Revision: D44657184

fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c
parent ed671e34
...@@ -41,7 +41,7 @@ D2GO_FSDP_WRAP_POLICY_REGISTRY = Registry("D2GO_FSDP_WRAP_POLICY_REGISTRY") ...@@ -41,7 +41,7 @@ D2GO_FSDP_WRAP_POLICY_REGISTRY = Registry("D2GO_FSDP_WRAP_POLICY_REGISTRY")
def add_fsdp_configs(_C: CN): def add_fsdp_configs(_C: CN):
_C.FSDP = CN() _C.FSDP = CN()
_C.FSDP.ALGORITHM = "" # 'grad_optim' or 'full' _C.FSDP.ALGORITHM = "grad_optim" # 'grad_optim', 'full', 'hybrid', 'hybrid_zero2'
# Configs for fully sharded data parallel (fsdp) # Configs for fully sharded data parallel (fsdp)
# Check out https://pytorch.org/docs/stable/fsdp.html # Check out https://pytorch.org/docs/stable/fsdp.html
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment