Commits · 6bc517ccd4a3bcda4d0621d54a37c3e047df223a · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "6bca56fdb0587a4291f8465a0a6e818f5541a5e3"

05 Sep, 2023 1 commit

deepspeed resume from ckpt fixes and adding support for deepspeed optimizer... · 6bc517cc

Sourab Mangrulkar authored Sep 05, 2023

deepspeed resume from ckpt fixes and adding support for deepspeed optimizer and HF scheduler (#25863)

* Add support for deepspeed optimizer and HF scheduler

* fix bug

* fix the import

* fix issue with deepspeed scheduler saving for hf optim + hf scheduler scenario

* fix loading of hf scheduler when loading deepspeed checkpoint

* fix import of `DeepSpeedSchedulerWrapper`

* add tests

* add the comment and skip the failing tests

* address comment

6bc517cc

25 Aug, 2023 1 commit

🚨

[`Refactor`] Move third-party related utility files into `integrations/` folder

🚨

(#25599) · 4b796978

Younes Belkada authored Aug 25, 2023



* move deepspeed to `lib_integrations.deepspeed`

* more refactor

* oops

* fix slow tests

* Fix docs

* fix docs

* addess feedback

* address feedback

* final modifs for PEFT

* fixup

* ok now

* trigger CI

* trigger CI again

* Update docs/source/en/main_classes/deepspeed.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* import from `integrations`

* address feedback

* revert removal of `deepspeed` module

* revert removal of `deepspeed` module

* fix conflicts

* ooops

* oops

* add deprecation warning

* place it on the top

* put `FutureWarning`

* fix conflicts with not_doctested.txt

* add back `bitsandbytes` module with a depr warning

* fix

* fix

* fixup

* oops

* fix doctests

---------
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

4b796978

31 May, 2023 1 commit

accelerate deepspeed and gradient accumulation integrate (#23236) · a73b1d59

Sourab Mangrulkar authored May 31, 2023

* mixed precision support via accelerate

* fix issues

* fix for the sharded ddp case

* fix flax and tf failing tests

* `refactor the place to create `Accelerator` object

* move ddp prep to accelerate

* fix 😅

* resolving comments

* move fsdp handling to accelerate

* fixex

* fix saving

* shift torch dynamo handling to accelerate

* shift deepspeed integration and save & load utils to accelerate

* fix accelerate launcher support

* oops

* fix 🐛

* save ckpt fix

* Trigger CI

* nasty 🐛 😅

* as deepspeed needs grad_acc fixes, transfer grad_acc to accelerate

* make tests happy

* quality ✨

* loss tracked needs to account for grad_acc

* fixing the deepspeed tests

* quality ✨

* 😅😅😅

* tests 😡

* quality ✨



* Trigger CI

* resolve comments and fix the issue with the previous merge from branch

* Trigger CI

* accelerate took over deepspeed integration

---------
Co-authored-by: Stas Bekman <stas@stason.org>

a73b1d59

11 Apr, 2023 1 commit
- Fix decorator order (#22708) · fe1f5a63
  Yih-Dar authored Apr 11, 2023
```
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  fe1f5a63
09 Mar, 2023 1 commit
- [deepspeed] offload + non-cpuadam optimizer exception (#22043) · ec24132b
  Stas Bekman authored Mar 09, 2023
```
* [deepspeed] offload + non-cpuadam optimizer exception

* flip

* revert min version
```
  ec24132b
23 Feb, 2023 1 commit
- [deepspeed tests] fix issues introduced by #21700 (#21769) · 63306263
  Stas Bekman authored Feb 23, 2023
```
* [deepspeed tests] fix issues introduced by #21700

* fix

* fix
```
  63306263
22 Feb, 2023 1 commit
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
08 Feb, 2023 1 commit
- [tests] add missing `report_to none` (#21505) · 8ea994d3
  Stas Bekman authored Feb 08, 2023
```
[tests] report_to none
```
  8ea994d3
06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

16 Jun, 2022 1 commit
- Refine Bf16 test for deepspeed (#17734) · 36d46479
  Sylvain Gugger authored Jun 16, 2022
```
* Refine BF16 check in CPU/GPU

* Fixes

* Renames
```
  36d46479
06 Jun, 2022 1 commit
- [deepspeed / testing] reset global state (#17553) · d28b7aa8
  Stas Bekman authored Jun 06, 2022
```
* [deepspeed] fix load_best_model test

* [deepspeed] add state reset on unittest tearDown
```
  d28b7aa8
03 Jun, 2022 1 commit
- [deepspeed] fix load_best_model test (#17550) · 26e5e129
  Stas Bekman authored Jun 03, 2022
  
  26e5e129
02 Jun, 2022 1 commit

[trainer/deepspeed] load_best_model (reimplement re-init) (#17151) · 2f59ad16

Stas Bekman authored Jun 02, 2022



* [trainer/deepspeed] load_best_model

* to sync with DS PR #1947

* simplify

* rework load_best_model test

* cleanup

* bump deepspeed>=0.6.5
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

2f59ad16

10 May, 2022 1 commit

[Deepspeed] add many more models to the model zoo test (#12695) · f8615044

Stas Bekman authored May 10, 2022

* model zoo take 2

* add deberta

* new param for zero2

* doc update

* doc update

* add layoutlm

* bump deepspeed

* add deberta-v2, funnel, longformer

* new models

* style

* add t5_v1

* update TAPAS status

* reorg problematic models

* move doc to another PR

* style

* fix checkpoint check test

* making progress on more models running

* cleanup

* new version

* cleanup

f8615044

15 Apr, 2022 1 commit

[trainer / deepspeed] fix hyperparameter_search (#16740) · ce2fef2a

Stas Bekman authored Apr 14, 2022

* [trainer / deepspeed] fix hyperparameter_search

* require optuna

* style

* oops

* add dep in the right place

* create deepspeed-testing dep group

* Trigger CI

ce2fef2a

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d

12 Mar, 2022 1 commit

[Deepspeed] add support for bf16 mode (#14569) · 580dd87c

Stas Bekman authored Mar 11, 2022



* [WIP] add support for bf16 mode

* prep for bf16

* prep for bf16

* fix; zero2/bf16 is ok

* check bf16 is available

* test fixes

* enable zero3_bf16

* config files

* docs

* split stage_dtype; merge back to non-dtype-specific config file

* fix doc

* cleanup

* cleanup

* bfloat16 => bf16 to match the PR changes

* s/zero_gather_fp16_weights_on_model_save/zero_gather_16bit_weights_on_model_save/; s/save_fp16_model/save_16bit_model/

* test fixes/skipping

* move

* fix

* Update docs/source/main_classes/deepspeed.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* backticks

* cleanup

* cleanup

* cleanup

* new version

* add note about grad accum in bf16
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

580dd87c

02 Mar, 2022 1 commit
- fix deepspeed tests (#15881) · b842d727
  Stas Bekman authored Mar 01, 2022
```
* fix deepspeed tests

* style

* more fixes
```
  b842d727
23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

03 Feb, 2022 1 commit
- [deepspeed] fix a bug in a test (#15493) · 4f5faaf0
  Stas Bekman authored Feb 03, 2022
```
* [deepspeed] fix a bug in a test

* consistency
```
  4f5faaf0
07 Dec, 2021 1 commit

[deepspeed] fix --load_best_model_at_end (#14652) · b66c5ab2

Stas Bekman authored Dec 06, 2021

* [deepspeed] fix load_best_model_at_end

* try with pull_request_target

* revert: try with pull_request_target

* style

* add test

* cleanup

b66c5ab2

23 Nov, 2021 1 commit

[deepspeed] zero inference (#14253) · 956a4831

Stas Bekman authored Nov 23, 2021



* [deepspeed] zero inference

* only z3 makes sense for inference

* fix and style

* docs

* rework

* fix test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

956a4831

11 Nov, 2021 1 commit
- solve the port conflict (#14362) · 1c76a516
  Stas Bekman authored Nov 10, 2021
  
  1c76a516
08 Nov, 2021 1 commit
- [deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set (#14331) · d0e96c6d
  Jeff Rasley authored Nov 08, 2021
```
* defer to DS_TEST_PORT if set

* style
Co-authored-by: Stas Bekman <stas@stason.org>
```
  d0e96c6d
30 Aug, 2021 1 commit

Use DS callable API to allow hf_scheduler + ds_optimizer (#13216) · 42f359d0

Olatunji Ruwase authored Aug 30, 2021



* Use DS callable API to allow hf_scheduler + ds_optimizer

* Preserve backward-compatibility

* Restore backward compatibility

* Tweak arg positioning

* Tweak arg positioning

* bump the required version

* Undo indent

* Update src/transformers/trainer.py

* style
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

42f359d0

23 Jul, 2021 1 commit
- [tests] fix logging_steps requirements (#12860) · 98364ea7
  Stas Bekman authored Jul 23, 2021
  
  98364ea7
14 Jul, 2021 1 commit
- non-native optimizers are mostly ok with zero-offload (#12690) · 5dd0c956
  Stas Bekman authored Jul 13, 2021
  
  5dd0c956
13 Jul, 2021 1 commit

[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477) · 78f5fe14

Stas Bekman authored Jul 13, 2021



* zero_to_fp32 tests

* args change

* remove unnecessary work

* use transformers.trainer_utils.get_last_checkpoint

* document the new features

* cleanup

* wip

* fix fsmt

* add bert

* cleanup

* add xlm-roberta

* electra works

* cleanup

* sync

* split off the model zoo tests

* cleanup

* cleanup

* cleanup

* cleanup

* reformat

* cleanup

* casing

* deepspeed>=0.4.3

* adjust distilbert

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

78f5fe14

22 Jun, 2021 1 commit
- [trainer] 2 bug fixes and a rename (#12309) · ebe54135
  Stas Bekman authored Jun 22, 2021
```
* bug fixes and a rename

* add extended DDP test
```
  ebe54135
08 Jun, 2021 2 commits

[Deepspeed Wav2vec2] integration (#11638) · 11d86d3d

Stas Bekman authored Jun 08, 2021

* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044

 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

11d86d3d

[Deepspeed] various fixes (#12058) · 32290d87

Stas Bekman authored Jun 08, 2021

* replace deprecated config

* sub_group_size was too big

* complete deprecation removal

32290d87

04 Jun, 2021 1 commit

[Deepspeed] Assert on mismatches between ds and hf args (#12021) · 2c73b930

Stas Bekman authored Jun 04, 2021



* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2c73b930

02 Jun, 2021 2 commits
- [deepspeed] add nvme test skip rule (#11997) · 61c50634
  Stas Bekman authored Jun 02, 2021
```
* add nvme skip rule

* fix
```
  61c50634
- [deepspeed] Move code and doc into standalone files (#11984) · 640318be
  Stas Bekman authored Jun 02, 2021
```
* move code and docs

* style

* moved

* restore
```
  640318be
01 Jun, 2021 1 commit

[DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966) · 7ec596ec

Stas Bekman authored Jun 01, 2021



* decouple DeepSpeedConfigHF from Trainer

* add LoggingLevel ctx manager; add new test

* cleanup

* add docs

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implemented suggested renames

* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7ec596ec

21 May, 2021 1 commit
- [Deepspeed] support `zero.Init` in `from_config` (#11805) · a26f4d62
  Stas Bekman authored May 21, 2021
```
* support zero.Init in from_config

* no need for eval test
```
  a26f4d62
06 May, 2021 1 commit
- [cuda ext tests] fixing tests (#11619) · 619200cc
  Stas Bekman authored May 06, 2021
```
* fixing tests

* cleanup
```
  619200cc
30 Apr, 2021 1 commit

[DeepSpeed] fp32 support (#11499) · 4e7bf94e

Stas Bekman authored Apr 30, 2021

* prep for deepspeed==0.3.16

* new version

* too soon

* support and test fp32 mode

* troubleshooting doc start

* workaround no longer needed

* add fp32 doc

* style

* cleanup, add tf32 note

* clarify

* release was made

4e7bf94e

26 Apr, 2021 2 commits

[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418) · bc2571e6

Stas Bekman authored Apr 26, 2021



* adding Z-inf

* revamp config process

* up version requirement

* wip

* massive rewrite

* cleanup

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* consistent json commas

* act on suggestions

* leave this feature for 0.3.16

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

bc2571e6

[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380) · 1d30ec95

Bhadresh Savani authored Apr 26, 2021

* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval

1d30ec95