Commits · a01b033cb45bb5ad77a2aa676367ef764b92e038 · chenpangpang / transformers

05 Jul, 2024 1 commit

Fix galore lr display with schedulers (#31710) · a01b033c

Anton Vlasjuk authored Jul 05, 2024

* fix galore lr display with lr schedulers

* style

* add some tests to check for displayed lrs

* copy-paste err for warmup steps

* standardize the default lr to be only in the optimizer

* trying out my luck with the reads

a01b033c

30 May, 2024 1 commit
- fix get_scheduler when name is warmup_stable_decay (#31128) · cda9c82a
  zspo authored May 30, 2024
```
fix get_scheduler args
```
  cda9c82a
25 Apr, 2024 1 commit

Add WSD scheduler (#30231) · 7b1170b0

Alexander Visheratin authored Apr 25, 2024

* Added WSD scheduler.

* Added tests.

* Fixed errors.

* Fix formatting.

* CI fixes.

7b1170b0

22 Apr, 2024 1 commit
- Fix layerwise GaLore optimizer hard to converge with warmup scheduler (#30372) · f3b3533e
  hoshi-hiyouga authored Apr 23, 2024
```
Update optimization.py
```
  f3b3533e
11 Apr, 2024 1 commit
- chore: remove repetitive words (#30174) · 58b170cd
  hugehope authored Apr 11, 2024
```
Signed-off-by: hugehope <cmm7@sina.cn>
```
  58b170cd
26 Mar, 2024 1 commit
- Add `cosine_with_min_lr` scheduler in Trainer (#29341) · ef609958
  Yanyi Liu authored Mar 26, 2024
```
* Add cosine_with_min_lr scheduler

* Update error message for missing min_lr or min_lr_rate
```
  ef609958
20 Mar, 2024 1 commit
- fix galore layerwise with frozen params (#29743) · a1a74541
  peterjc123 authored Mar 20, 2024
  
  a1a74541
19 Mar, 2024 1 commit

FEAT / Optim: Add GaLore optimizer (#29588) · f6261d7d

Younes Belkada authored Mar 19, 2024



* add galore v1

* add import

* add tests and doc

* fix doctest

* forward contrib credits from discussions

* forward contrib credits from discussions

* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix failing tests'

* switch to `optim_target_modules` and clarify docs

* more clarification

* enhance lookup logic

* update a test to add peak memory

* add regex, all-linear and single string support

* add layer-wise optimization through DummyOptimizers and LRSchedulers

* forward contrib credits from discussions and original idea

* add a section about DDP not supported in layerwise

* Update src/transformers/trainer.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix self

* check only if layer_wise

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* oops

* make use of intervals

* clarify comment

* add matching tests

* GaLoRe -> GaLore

* move to `get_scheduler`

* add note on docs

* add a warning

* adapt a bit the docs

* update docstring

* support original API

* Update docs/source/en/trainer.md

* slightly refactor

* Update docs/source/en/trainer.md
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix args parsing and add tests

* remove warning for regex

* fix type hint

* add note about extra args

* make `is_regex` return optional

---------

Co-authored-by: Maxime <maximegmd @users.noreply.github.com>
Co-authored-by: Wing Lian <winglian @users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: hiyouga <hiyouga@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

f6261d7d

01 Mar, 2024 1 commit
- Correct zero division error in inverse sqrt scheduler (#28982) · 831bc25d
  David Valente authored Mar 01, 2024
```
* Correct zero division error in inverse sqrt scheduler

* default timescale to 10_000
```
  831bc25d
08 Dec, 2023 1 commit
- Added passing parameters to "reduce_lr_on_plateau" scheduler (#27860) · fe8d1302
  Charbel Abi Daher authored Dec 08, 2023
  
  fe8d1302
07 Nov, 2023 1 commit

Allow scheduler parameters (#26480) · 7e1eff76

Plemeur authored Nov 08, 2023



* Allow for scheduler kwargs

* Formatting

* Arguments checks, passing the tests

* Black failed somehow

---------
Co-authored-by: Pierre <pierre@avatarin.com>

7e1eff76

04 Oct, 2023 1 commit

Docstring check (#26052) · 03af4c42

Sylvain Gugger authored Oct 04, 2023



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

03af4c42

30 May, 2023 1 commit

Editing issue with pickle def with lambda function (#23869) · 6451ad04

George authored May 30, 2023



* Editing issue with pickle def with lambda function

* fix type

* Made helper function private

* delete tab

---------
Co-authored-by: georgebredis <9454-georgebredis@users.noreply.gitlab.aicrowd.com>

6451ad04

19 May, 2023 1 commit
- Remove .data usages in optimizations.py (#23417) · 8aa8513f
  Jiewen Tan authored May 19, 2023
```
Patched the optimizers
```
  8aa8513f
28 Apr, 2023 1 commit

Add Trainer support for ReduceLROnPlateau (#23010) · 9b435204

Maxime Méloux authored Apr 28, 2023



* Add Trainer support for ReduceLROnPlateau

Fixes #16503

* Remove training argument and add default instance

---------
Co-authored-by: mmeloux <maxime.meloux@loria.fr>

9b435204

02 Mar, 2023 1 commit

Make schedulers picklable by making lr_lambda fns global (#21768) · 8e5a1b2a

Connor Henderson authored Mar 02, 2023

* Make schedulers picklable by making lr_lambda fns global

* add unused _get_constant_schedule_lr_lambda arg

* remove unneeded _get_constant_schedule_lr_lamda

* add test

* make style

* rebase, remove torch dep, put lambda back

* repo-consistency and style

8e5a1b2a

22 Feb, 2023 1 commit
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
07 Feb, 2023 1 commit

Add inverse sqrt learning rate scheduler (#21495) · a3034c70

Adrian Sager La Ganga authored Feb 07, 2023

* added inverse sqrt lr scheduler

* Updated get_scheduler in src/transformers/optimization.py

* Updated src/transformers/__init__.py

* Added inverse sqrt lr scheduler test

* Updated docs/source/en/main_classes/optimizer_schedules.mdx

* Ran style and quality scripts

* Fix get_inverse_sqrt_schedule docstring

* Comment implementation URL

a3034c70

12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

16 Feb, 2022 1 commit
- Add a missing space in a deprecation message (#15651) · e3d1a8da
  Santiago Castro authored Feb 15, 2022
  
  e3d1a8da
09 Feb, 2022 1 commit
- Upgrade black to version ~=22.0 (#15565) · 7732d0fe
  Lysandre Debut authored Feb 09, 2022
```
* Upgrade black to version ~=22.0

* Check copies

* Fix code
```
  7732d0fe
13 Jan, 2022 1 commit

Deprecates AdamW and adds `--optim` (#14744) · 7b83feb5

Manuel R. Ciosici authored Jan 13, 2022



* Add AdamW deprecation warning

* Add --optim to Trainer

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

* fix style

* fix

* Regroup adamws together
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Change --adafactor to --optim adafactor

* Use Enum for optimizer values

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Use Enum for optimizer values

* Improved documentation for --adafactor
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add mention of no_deprecation_warning
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename OptimizerOptions to OptimizerNames

* Use choices for --optim

* Move optimizer selection code to a function and add a unit test

* Change optimizer names

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename method
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Remove TODO comment
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename function

* Rename variable

* Parameterize the tests for supported optimizers

* Refactor

* Attempt to make tests pass on CircleCI

* Add a test with apex

* rework to add apex to parameterized; add actual train test

* fix import when torch is not available

* fix optim_test_params when torch is not available

* fix optim_test_params when torch is not available

* re-org

* small re-org

* fix test_fused_adam_no_apex

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove .value from OptimizerNames

* Rename optimizer strings s|--adam_|--adamw_|

* Also rename Enum options

* small fix

* Fix instantiation of OptimizerNames. Remove redundant test

* Use ExplicitEnum instead of Enum

* Add unit test with string optimizer

* Change optimizer default to string value
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

7b83feb5

28 Dec, 2021 1 commit

Doc styler examples (#14953) · b5e2b183

Sylvain Gugger authored Dec 27, 2021

* Fix bad examples

* Add black formatting to style_doc

* Use first nonempty line

* Put it at the right place

* Don't add spaces to empty lines

* Better templates

* Deal with triple quotes in docstrings

* Result of style_doc

* Enable mdx treatment and fix code examples in MDXs

* Result of doc styler on doc source files

* Last fixes

* Break copy from

b5e2b183

27 Dec, 2021 2 commits

[doc] consistent True/False/None default format (#14951) · 133c5e40

Stas Bekman authored Dec 27, 2021



* [doc] consistent True/False/None default format

* Update src/transformers/models/xlnet/modeling_xlnet.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

133c5e40

Doc styler v2 (#14950) · 87e6e4fe

Sylvain Gugger authored Dec 27, 2021

* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring

87e6e4fe

21 Dec, 2021 2 commits

Mass conversion of documentation from rst to Markdown (#14866) · 27b3031d

Sylvain Gugger authored Dec 21, 2021

* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever

27b3031d

Fix the value error typo of AdamW's betas' valid values checking (#14780) · 00620583
Zed authored Dec 21, 2021
```
* Fix the value error typo of AdamW's betas value check

* error fixed
```
00620583

12 Dec, 2021 1 commit
- [Adafactor] Fix adafactor (#14713) · 91f3dfbf
  Patrick von Platen authored Dec 12, 2021
```
* correct changes

* add comment
```
  91f3dfbf
25 Aug, 2021 1 commit
- Replace assert statement with if condition and ValueError (#13263) · 225de5cc
  Nishant Prabhu authored Aug 25, 2021
  
  225de5cc
17 Jun, 2021 1 commit

fix pt-1.9.0 `add_` deprecation (#12217) · d6ea91c9

Stas Bekman authored Jun 17, 2021

* fix pt-1.9.0 add_ deprecation

* add () for clarity

* Trigger CI

* require_version(torch

d6ea91c9

14 Jun, 2021 2 commits

[style] consistent nn. and nn.functional (#12124) · 1ed2ebf6
Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional

* fix glitch

* fix glitch #2
```
1ed2ebf6

[optim] implement AdafactorSchedule (#12123) · ff7c8168

Stas Bekman authored Jun 14, 2021



* implement AdafactorSchedule

* typo

* fix

* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ff7c8168

06 May, 2021 1 commit
- Fix docstring typo (#11611) · cf409e55
  Eldar Kurtic authored May 06, 2021
  
  cf409e55
01 Apr, 2021 1 commit

Fix Adafactor documentation (recommend correct settings) (#10526) · c301c263

Josh authored Mar 31, 2021



* Update optimization.py

Fix documentation to reflect optimal settings for Adafactor

* update and expand on the recommendations

* style

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* flip scale_parameter to True for the 2nd recommendatoin
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

c301c263

31 Mar, 2021 1 commit

Enforce string-formatting with f-strings (#10980) · acc3bd9d

Sylvain Gugger authored Mar 31, 2021



* First third

* Styling and fix mistake

* Quality

* All the rest

* Treat %s and %d

* typo

* Missing )

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

acc3bd9d

01 Feb, 2021 1 commit

Adafactor: avoid updating group["lr"] attributes (#9751) · 8672bcda

CeShine Lee authored Feb 01, 2021

This affects Adafactor with relative_step=False and scale_parameter=True.
Updating group["lr"] makes the result of ._get_lr() depends on the previous call,
i.e., on the scale of other parameters. This isn't supposed to happen.

8672bcda

22 Dec, 2020 1 commit

Seq2seq trainer (#9241) · 490b39e6

Sylvain Gugger authored Dec 22, 2020



* Add label smoothing in Trainer

* Add options for scheduler and Adafactor in Trainer

* Put Seq2SeqTrainer in the main lib

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments and adapt scripts

* Documentation

* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

490b39e6

26 Oct, 2020 1 commit

Doc styling (#8067) · 08f534d2

Sylvain Gugger authored Oct 26, 2020

* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy

08f534d2

31 Aug, 2020 1 commit
- Fix in Adafactor docstrings (#6845) · d2f9cb83
  Sylvain Gugger authored Aug 31, 2020
  
  d2f9cb83
27 Aug, 2020 1 commit
- Adafactor docs (#6765) · 41aa2b4e
  Lysandre Debut authored Aug 27, 2020
  
  41aa2b4e