• Jun Ru Anderson's avatar
    Add grad scaler (#48) · b6a5e634
    Jun Ru Anderson authored
    
    
    Add GradScaler to Fairscale, subclassing PyTorch's GradScaler. Use GradScaler in the pipe benchmark; though it is not needed in this case, it is a good example of how to use gradient scaling for larger models that do require gradient scaling in order to converge.
    Co-authored-by: default avatarJun Ru Anderson <andersonic@fb.com>
    b6a5e634
test_adam.py 13.6 KB