"src/vscode:/vscode.git/clone" did not exist on "bdeff4d64a57e556c2b62f887da03a2c37c54d54"
[feat] optimizer state scaling (#44)
Implement scaling of optimizer state when using pure-fp16 training to avoid underflow. Update benchmark to use pure-fp16. Modify state_dict methods to store and load the optimizer state scale.
Co-authored-by:
Jun Ru Anderson <andersonic@fb.com>
Showing
Please register or sign in to comment