-
Igor Ganichev authored
While the original code works fine in practice, it technically allows gradient application and moving average update to happen in any order. This causes the behavior to deviate from pure mathematical specifications.
f6e4dfe9
While the original code works fine in practice, it technically allows gradient application and moving average update to happen in any order. This causes the behavior to deviate from pure mathematical specifications.