Add CPUAdam optimizer for zero-offload in deepspeed engine (#484)
* add adamW to CPU-ADAM implementation
* supporting cpu-adam optimizer for zero-offload on deepspeed side
* bump DSE to match cpu-adam updates
Co-authored-by:
Jeff Rasley <jerasley@microsoft.com>
Showing
Please register or sign in to comment