fix distributed initialization for FSDP
Summary: Pull Request resolved: https://github.com/facebookresearch/d2go/pull/657 Without properly set `requires_grad` for params and buffers, it causes hang in FSDP training. This becomes an issue eg when training with LoRA. Reviewed By: wat3rBro Differential Revision: D55220828 fbshipit-source-id: 1e33aa540c84c4de62a3a37c48a322aa26c98292
Showing
Please register or sign in to comment