Add grad scaler (#48)
Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
Co-authored-by:
Jun Ru Anderson <andersonic@fb.com>
Showing
Please register or sign in to comment