Unverified Commit d2f9cb83 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Fix in Adafactor docstrings (#6845)

parent 2de7ee03
...@@ -346,7 +346,7 @@ class Adafactor(Optimizer): ...@@ -346,7 +346,7 @@ class Adafactor(Optimizer):
If True, learning rate is scaled by root mean square If True, learning rate is scaled by root mean square
relative_step (:obj:`bool`, `optional`, defaults to :obj:`True`): relative_step (:obj:`bool`, `optional`, defaults to :obj:`True`):
If True, time-dependent learning rate is computed instead of external learning rate If True, time-dependent learning rate is computed instead of external learning rate
warmup_init (:obj:`bool`, `optional`, defaults to False): warmup_init (:obj:`bool`, `optional`, defaults to :obj:`False`):
Time-dependent learning rate computation depends on whether warm-up initialization is being used Time-dependent learning rate computation depends on whether warm-up initialization is being used
This implementation handles low-precision (FP16, bfloat) values, but we have not thoroughly tested. This implementation handles low-precision (FP16, bfloat) values, but we have not thoroughly tested.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment