Fix layerwise GaLore optimizer hard to converge with warmup scheduler (#30372)

Update optimization.py

Fix layerwise GaLore optimizer hard to converge with warmup scheduler (#30372)
Update optimization.py
f3b3533e · hoshi-hiyouga · GitHub · 0d84901c · f3b3533e
Unverified Commit f3b3533e authored Apr 23, 2024 by hoshi-hiyouga Committed by GitHub Apr 22, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 3 deletions

src/transformers/optimization.py src/transformers/optimization.py +2 -3

No files found.
--- a/src/transformers/optimization.py
+++ b/src/transformers/optimization.py
@@ -444,9 +444,8 @@ def get_scheduler(

        def scheduler_hook(param):
            # Since the optimizer hook has been already attached we only need to
-            # attach the scheduler hook
-            if param.grad is not None:
-                scheduler_dict[param].step()
+            # attach the scheduler hook, the gradients have been zeroed here
+            scheduler_dict[param].step()

        for param in optimizer_dict.keys():
            if param.requires_grad: