Distributed LAMB: Clip grads before reduce_scatter/all_reduce (#1099)
* clip before reduce scatter * provide clip before/after RS option * change to clip after ar (avoid confusion) * fix comments
Showing
Please register or sign in to comment