Unverified Commit f3b3533e authored by hoshi-hiyouga's avatar hoshi-hiyouga Committed by GitHub
Browse files

Fix layerwise GaLore optimizer hard to converge with warmup scheduler (#30372)

Update optimization.py
parent 0d84901c
......@@ -444,9 +444,8 @@ def get_scheduler(
def scheduler_hook(param):
# Since the optimizer hook has been already attached we only need to
# attach the scheduler hook
if param.grad is not None:
scheduler_dict[param].step()
# attach the scheduler hook, the gradients have been zeroed here
scheduler_dict[param].step()
for param in optimizer_dict.keys():
if param.requires_grad:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment