Alternatively, relative_step with warmup_init can be used.
Training without LR warmup or clip threshold, is not recommended. Additional optimizer operations like gradient clipping, should not be used alongside Adafactor.
- Alternatively, relative_step with warmup_init can be used.
- Training without LR warmup or clip threshold is not recommended. Additional optimizer operations like
gradient clipping should not be used alongside Adafactor.