"src/vscode:/vscode.git/clone" did not exist on "13e48492f0aca759dda5056481d32b641af0450f"
Improvements in distributed Adam optimizer for Megatron (#1432)
* Improvements in distributed Adam optimizer for Megatron Add option to allocate gradient buckets out of one large buffer. Add option to initialize params in user-provided order. Perform communication when saving optimizer state. Support param sync with any dtype. * Style fixes in distributed Adam helper classes Review suggestions from @crcrpar
Showing
Please register or sign in to comment