-
Deepak Narayanan authored
wgrad should be zero'ed out if a weight parameter is shared among multiple layers Signed-off-by:Deepak Narayanan <dnarayanan@nvidia.com>
387397a2
wgrad should be zero'ed out if a weight parameter is shared among multiple layers
Signed-off-by:
Deepak Narayanan <dnarayanan@nvidia.com>