Commits · a761d6e9a02598690e9eceeec7a8b662149d7bdb · chenpangpang / transformers

24 Nov, 2023 1 commit

Refactoring Trainer, adds `save_only_model` arg and simplifying FSDP integration (#27652) · a761d6e9

Sourab Mangrulkar authored Nov 24, 2023



* add code changes

1. Refactor FSDP
2. Add `--save_only_model` option: When checkpointing, whether to only save the model, or also the optimizer, scheduler & rng state.
3. Bump up the minimum `accelerate` version to `0.21.0`

* quality

* fix quality?

* Revert "fix quality?"

This reverts commit 149330a6abc078827be274db84c8a2d26a76eba1.

* fix fsdp doc strings

* fix quality

* Update src/transformers/training_args.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* please fix the quality issue 😅



* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comment

* simplify conditional check as per the comment

* update documentation

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

a761d6e9

22 Nov, 2023 1 commit

Fix `max_steps` documentation regarding the end-of-training condition (#27624) · b2c63c79

Quentin Gallouédec authored Nov 22, 2023



* fix max_steps doc

* Update src/transformers/training_args.py [ci skip]
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* propagate suggested change

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

b2c63c79

14 Nov, 2023 2 commits

Track the number of tokens seen to metrics (#27274) · 2fc33ebe

Zach Mueller authored Nov 14, 2023



* Add tokens seen

* Address comments, add to TrainingArgs

* Update log

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use self.args

* Fix docstring
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

2fc33ebe

Minor type annotation fix (#27276) · 250032e9
Costa Huang authored Nov 14, 2023
```
* Minor type annotation fix

* Trigger Build
```
250032e9

13 Nov, 2023 1 commit
- Fix docstring for `gradient_checkpointing_kwargs` (#27470) · 2dc29cfc
  Tomasz Cichy authored Nov 13, 2023
```
Docstring entry for `gradient_checkpointing_kwargs` was
`gradient_checkpointing_args`. This is incorrect.
```
  2dc29cfc
09 Nov, 2023 1 commit
- Adds dvclive callback (#27352) · 791ec370
  Dave Berenbaum authored Nov 09, 2023
```
* dvclive trainer callback

* style fixes

* dvclive link fixes
```
  791ec370
07 Nov, 2023 1 commit

Allow scheduler parameters (#26480) · 7e1eff76

Plemeur authored Nov 08, 2023



* Allow for scheduler kwargs

* Formatting

* Arguments checks, passing the tests

* Black failed somehow

---------
Co-authored-by: Pierre <pierre@avatarin.com>

7e1eff76

01 Nov, 2023 1 commit

Enable split_batches through TrainingArguments (#26798) · 3520e37e

Zach Mueller authored Nov 01, 2023

* Enable split_batches through TrainingArguments

* Extra dispatch_batches

* Keep as default false

* Add to docstring

* Add to docstring

* Remove the capturewarnings change

* Comma

3520e37e

31 Oct, 2023 2 commits

Safetensors serialization by default (#27064) · 113ebf80

Lysandre Debut authored Oct 31, 2023



* Safetensors serialization by default

* First pass on the tests

* Second pass on the tests

* Third pass on the tests

* Fix TF weight loading from TF-format safetensors

* Specific encoder-decoder fixes for weight crossloading

* Add VisionEncoderDecoder fixes for TF too

* Change filename test for pt-to-tf

* One missing fix for TFVisionEncoderDecoder

* Fix the other crossload test

* Support for flax + updated tests

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Sanchit's comments

* Sanchit's comments 2

* Nico's comments

* Fix tests

* cleanup

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

113ebf80

[FEAT] Add Neftune into transformers Trainer (#27141) · 309a9066

Younes Belkada authored Oct 31, 2023



* add v1 neftune

* use `unwrap_model` instead

* add test + docs

* Apply suggestions from code review
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* more details

* fixup

* Update docs/source/en/main_classes/trainer.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor a bit

* more elaborated test

* fix unwrap issue

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

309a9066

30 Oct, 2023 2 commits
- remove the obsolete code related to fairscale FSDP (#26651) · d751dbec
  Hz, Ji authored Oct 30, 2023
```
* remove the obsolete code related to fairscale FSDP

* apple review suggestion
```
  d751dbec
- [`Trainer` / `GC`] Add `gradient_checkpointing_kwargs` in trainer and training arguments (#27068) · 5fbed2d7
  Younes Belkada authored Oct 30, 2023
```
* add `gradient_checkpointing_kwargs` in trainer and training arguments

* add comment

* add test - currently failing

* now tests pass
```
  5fbed2d7
26 Oct, 2023 1 commit

Correct docstrings and a typo in comments (#27047) · 18925925

L. Yeung authored Oct 26, 2023



* docs(training_args): correct docstrings

Correct docstrings of these methods in `TrainingArguments`:

- `set_save`
- `set_logging`

* docs(training_args): adjust words in docstrings
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(trainer): correct a typo in comments

---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

18925925

12 Oct, 2023 1 commit
- Add many missing spaces in adjacent strings (#26751) · 40ea9ab2
  Tom Aarsen authored Oct 12, 2023
```
Add missing spaces in adjacent strings
```
  40ea9ab2
11 Oct, 2023 1 commit
- Update docs to explain disabling callbacks using report_to (#26155) · 9f406392
  Ben Gubler authored Oct 11, 2023
```
* feat: update callback doc to explain disabling callbacks using report_to

* docs: update report_to docstring
```
  9f406392
06 Oct, 2023 1 commit

remove SharedDDP as it is deprecated (#25702) · 27597fea

statelesshz authored Oct 06, 2023



* remove SharedDDP as it was drepracated

* apply review suggestion

* make style

* Oops,forgot to remove the compute_loss context manager in Seq2SeqTrainer.

* remove the unnecessary conditional statement

* keep the logic of IPEX

* clean code

* mix precision setup & make fixup

---------
Co-authored-by: statelesshz <jihuazhong1@huawei.com>

27597fea

04 Oct, 2023 1 commit
- Extend Trainer to enable Ascend NPU to use the fused Adamw optimizer when training (#26194) · 4fdf47cd
  statelesshz authored Oct 04, 2023
  
  4fdf47cd
27 Sep, 2023 1 commit
- add bf16 mixed precision support for NPU (#26163) · 946bac79
  statelesshz authored Sep 27, 2023
```
Co-authored-by: statelesshz <jihuazhong1@huawei.com>
```
  946bac79
26 Sep, 2023 1 commit
- Add torch `RMSProp` optimizer (#26425) · 408b2b3c
  Nathan Lambert authored Sep 26, 2023
```
add rmsprop
```
  408b2b3c
13 Sep, 2023 2 commits

Flex xpu bug fix (#26135) · 05de038f
Abhilash Majumder authored Sep 14, 2023
```
flex gpu bug fix
```
05de038f

Update training_args.py - addition of self.distributed_state when using XPU (#25999) · e52f1cb6

Serizao authored Sep 13, 2023



* Update training_args.py

Missing distributed state so lign 1813-1814 failed because value is undefined

* Update training_args.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

e52f1cb6

07 Sep, 2023 1 commit

Add `tgs` speed metrics (#25858) · 3744126c

CokeDong authored Sep 08, 2023



* Add tgs metrics

* bugfix and black formatting

* workaround for tokens counting

* formating and bugfix

* Fix

* Add opt-in for tgs metrics

* make style and fix error

* Fix doc

* fix docbuild

* hf-doc-build

* fix

* test

* Update src/transformers/training_args.py

renaming
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Update src/transformers/training_args.py

renaming
Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Fix some symbol

* test

* Update src/transformers/trainer_utils.py

match nameing patterns
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/training_args.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/trainer.py

nice
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix reviews

* Fix

* Fix black

---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

3744126c

05 Sep, 2023 2 commits

Patch with accelerate xpu (#25714) · 70a98024

Abhilash Majumder authored Sep 05, 2023

* patch with accelerate xpu

* patch with accelerate xpu

* formatting

* fix tests

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* revert ruff unrelated fixes

* fix test

* review fixes

* review fixes

* black fixed

* review commits

* review commits

* style fix

* use pytorch_utils

* revert markuplm test

70a98024

Update training_args.py to remove the runtime error (#25920) · aea76149

Sahel Sharify authored Sep 05, 2023

This cl iterates through a list of keys rather than dict items while updating the dict elements. Fixes the following error:
File "..../transformers/training_args.py", line 1544, in post_init
for k, v in self.fsdp_config.items():
RuntimeError: dictionary keys changed during iteration

aea76149

01 Sep, 2023 1 commit
- Revert frozen training arguments (#25903) · be0e189b
  Zach Mueller authored Sep 01, 2023
```
* Revert frozen training arguments

* TODO
```
  be0e189b
29 Aug, 2023 1 commit

Arde/fsdp activation checkpointing (#25771) · 738ecd17

Arup De authored Aug 29, 2023

* add FSDP config option to enable activation-checkpointing

* update docs

* add checks and remove redundant code

* fix formatting error

738ecd17

25 Aug, 2023 1 commit

🚨

[`Refactor`] Move third-party related utility files into `integrations/` folder

🚨

(#25599) · 4b796978

Younes Belkada authored Aug 25, 2023



* move deepspeed to `lib_integrations.deepspeed`

* more refactor

* oops

* fix slow tests

* Fix docs

* fix docs

* addess feedback

* address feedback

* final modifs for PEFT

* fixup

* ok now

* trigger CI

* trigger CI again

* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* import from `integrations`

* address feedback

* revert removal of `deepspeed` module

* revert removal of `deepspeed` module

* fix conflicts

* ooops

* oops

* add deprecation warning

* place it on the top

* put `FutureWarning`

* fix conflicts with not_doctested.txt

* add back `bitsandbytes` module with a depr warning

* fix

* fix

* fixup

* oops

* fix doctests

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

4b796978

18 Aug, 2023 1 commit
- fix z3 init when using accelerate launcher (#25589) · 636acc75
  Sourab Mangrulkar authored Aug 18, 2023
  
  636acc75
17 Aug, 2023 1 commit

add util for ram efficient loading of model when using fsdp (#25107) · c4c0ceff

Sourab Mangrulkar authored Aug 17, 2023

* add util for ram efficient loading of model when using fsdp

* make fix-copies

* fixes 😅

* docs

* making it further easier to use

* rename the function

* refactor to handle fsdp ram efficiency in `from_pretrained`

* fixes

* fixes

* fixes

* update

* fixes

* revert `load_pretrained_model_only_on_rank0`

* resolve `load_from_checkpoint`

c4c0ceff

15 Aug, 2023 1 commit

Make training args fully immutable (#25435) · ca514992

Zach Mueller authored Aug 15, 2023

* Make training args fully immutable

* Working tests, PyTorch

* In test_trainer

* during testing

* Use proper dataclass way

* Fix test

* Another one

* Fix tf

* Lingering slow

* Exception

* Clean

ca514992

09 Aug, 2023 1 commit
- Improve training args (#25401) · 00b93cda
  Alan Ji authored Aug 09, 2023
```
* enhanced tips for some training args

* make style
```
  00b93cda
07 Aug, 2023 1 commit

Migrate Trainer from `Repository` to `upload_folder` (#25095) · baf1daa5

Sylvain Gugger authored Aug 07, 2023



* First draft

* Deal with progress bars

* Update src/transformers/utils/hub.py
Co-authored-by: Lucain <lucainp@gmail.com>

* Address review comments

* Forgot one

* Pin hf_hub

* Add argument for push all and fix tests

* Fix tests

* Address review comments

---------
Co-authored-by: Lucain <lucainp@gmail.com>

baf1daa5

03 Aug, 2023 1 commit

Docs: Update list of `report_to` logging integrations in docstring (#25281) · 15082a9d

Tom Aarsen authored Aug 03, 2023

* Update list of logging integrations in docstring

Also update type hint

* Also add 'flyte' to report_to callback list

* Revert 'report_to' type hint update

Due to CLI breaking

15082a9d

02 Aug, 2023 1 commit
- resolving zero3 init when using accelerate config with Trainer (#25227) · 904e7e0f
  Sourab Mangrulkar authored Aug 02, 2023
```
* resolving zero3 init when using accelerate config with Trainer

* refactor

* fix

* fix import
```
  904e7e0f
28 Jul, 2023 1 commit

Fix `.push_to_hub` and cleanup `get_full_repo_name` usage (#25120) · 6232c380

Lucain authored Jul 28, 2023

* Fix .push_to_hub and cleanup get_full_repo_name usage

* Do not rely on Python bool conversion magic

* request changes

6232c380

27 Jul, 2023 1 commit
- 🚨🚨🚨Change default from `adamw_hf` to `adamw_torch` 🚨🚨🚨 (#25109) · a1c4954d
  Zach Mueller authored Jul 27, 2023
```
* Change defaults

* Sylvain's comments
```
  a1c4954d
25 Jul, 2023 1 commit
- Set `TF32` flag for PyTorch cuDNN backend (#25075) · 6bc61aa7
  Xuehai Pan authored Jul 25, 2023
  
  6bc61aa7
24 Jul, 2023 1 commit
- Add dispatch_batches to training arguments (#25038) · 3b734f50
  Zach Mueller authored Jul 24, 2023
```
* Dispatch batches

* Copy items
```
  3b734f50
21 Jul, 2023 2 commits
- Fix type annotation for deepspeed training arg (#24988) · a6484c89
  Sylvain Gugger authored Jul 21, 2023
  
  a6484c89
- fsdp fixes and enhancements (#24980) · f4eb459e
  Sourab Mangrulkar authored Jul 21, 2023
```
* fix fsdp prepare to remove the warnings and fix excess memory usage

* Update training_args.py

* parity for FSDP+XLA

* Update trainer.py
```
  f4eb459e