Commit 35affd74 authored by Anthony Chen's avatar Anthony Chen Committed by Facebook GitHub Bot
Browse files

change default FSDP strategy to grad_optim (ZERO2)

Summary:
Pull Request resolved: https://github.com/facebookresearch/d2go/pull/522

Change d2go's default FSDP sharding strategy to grad_optim, which corresponds to ShardingStrategy.SHARD_GRAD_OP in FSDP API, or ZERO2 in literature. grad_optim is shown to have the best tradeoff between memory utilization and training speed for mid-sized models.

`FSDP.ALGORITHM = ""` was from the previous design to indicate that no FSDP is used. It will not work now

Reviewed By: tglik

Differential Revision: D44657184

fbshipit-source-id: 3888eea5f2b5042269e69453f3cdd8db7cf1581c
parent ed671e34
...@@ -41,7 +41,7 @@ D2GO_FSDP_WRAP_POLICY_REGISTRY = Registry("D2GO_FSDP_WRAP_POLICY_REGISTRY") ...@@ -41,7 +41,7 @@ D2GO_FSDP_WRAP_POLICY_REGISTRY = Registry("D2GO_FSDP_WRAP_POLICY_REGISTRY")
def add_fsdp_configs(_C: CN): def add_fsdp_configs(_C: CN):
_C.FSDP = CN() _C.FSDP = CN()
_C.FSDP.ALGORITHM = "" # 'grad_optim' or 'full' _C.FSDP.ALGORITHM = "grad_optim" # 'grad_optim', 'full', 'hybrid', 'hybrid_zero2'
# Configs for fully sharded data parallel (fsdp) # Configs for fully sharded data parallel (fsdp)
# Check out https://pytorch.org/docs/stable/fsdp.html # Check out https://pytorch.org/docs/stable/fsdp.html
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment