`wgrad` should be zero'ed out if a weight parameter is shared among multiple layers (#545)
wgrad should be zero'ed out if a weight parameter is shared among multiple layers
Signed-off-by:
Deepak Narayanan <dnarayanan@nvidia.com>
Showing
Please register or sign in to comment