-
Nikolai Yakovenko authored
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM. * update PR fixes, add basic test * bug -- incorrect params in test * bugfix -- import Adafactor into test * bugfix -- removed accidental T5 include * resetting T5 to master * bugfix -- include Adafactor in __init__ * longer loop for adafactor test * remove double error class declare * lint * black * isort * Update src/transformers/optimization.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * single docstring * Cleanup docstring Co-authored-by:
Nikolai Y <nikolai.yakovenko@point72.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
971d1802