Commits · f44e2c2b6f319514d8a80eccc2dd4c480697cc13 · chenpangpang / transformers

"examples/vscode:/vscode.git/clone" did not exist on "0ef61d392cee69b4269ac97af7e8582052d9a0cd"

09 Jun, 2022 1 commit

Younes Belkada authored Jun 09, 2022



* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f

'

* merge commit

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

ca2a55e9

06 Jun, 2022 1 commit
- [deepspeed / testing] reset global state (#17553) · d28b7aa8
  Stas Bekman authored Jun 06, 2022
```
* [deepspeed] fix load_best_model test

* [deepspeed] add state reset on unittest tearDown
```
  d28b7aa8
03 Jun, 2022 1 commit
- [deepspeed] fix load_best_model test (#17550) · 26e5e129
  Stas Bekman authored Jun 03, 2022
  
  26e5e129
02 Jun, 2022 1 commit

[trainer/deepspeed] load_best_model (reimplement re-init) (#17151) · 2f59ad16

Stas Bekman authored Jun 02, 2022



* [trainer/deepspeed] load_best_model

* to sync with DS PR #1947

* simplify

* rework load_best_model test

* cleanup

* bump deepspeed>=0.6.5
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

2f59ad16

10 May, 2022 2 commits

missing file (#17164) · 976835d5
Stas Bekman authored May 10, 2022

976835d5

[Deepspeed] add many more models to the model zoo test (#12695) · f8615044

Stas Bekman authored May 10, 2022

* model zoo take 2

* add deberta

* new param for zero2

* doc update

* doc update

* add layoutlm

* bump deepspeed

* add deberta-v2, funnel, longformer

* new models

* style

* add t5_v1

* update TAPAS status

* reorg problematic models

* move doc to another PR

* style

* fix checkpoint check test

* making progress on more models running

* cleanup

* new version

* cleanup

f8615044

03 May, 2022 1 commit

Move test model folders (#17034) · 19420fd9

Yih-Dar authored May 03, 2022



* move test model folders (TODO: fix imports and others)

* fix (potentially partially) imports (in model test modules)

* fix (potentially partially) imports (in tokenization test modules)

* fix (potentially partially) imports (in feature extraction test modules)

* fix import utils.test_modeling_tf_core

* fix path ../fixtures/

* fix imports about generation.test_generation_flax_utils

* fix more imports

* fix fixture path

* fix get_test_dir

* update module_to_test_file

* fix get_tests_dir from wrong transformers.utils

* update config.yml (CircleCI)

* fix style

* remove missing imports

* update new model script

* update check_repo

* update SPECIAL_MODULE_TO_TEST_MAP

* fix style

* add __init__

* update self-scheduled

* fix add_new_model scripts

* check one way to get location back

* python setup.py build install

* fix import in test auto

* update self-scheduled.yml

* update slack notification script

* Add comments about artifact names

* fix for yolos
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

19420fd9

15 Apr, 2022 1 commit

[trainer / deepspeed] fix hyperparameter_search (#16740) · ce2fef2a

Stas Bekman authored Apr 14, 2022

* [trainer / deepspeed] fix hyperparameter_search

* require optuna

* style

* oops

* add dep in the right place

* create deepspeed-testing dep group

* Trigger CI

ce2fef2a

23 Mar, 2022 1 commit

Reorganize file utils (#16264) · 4975002d

Sylvain Gugger authored Mar 23, 2022

* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit

4975002d

12 Mar, 2022 1 commit

[Deepspeed] add support for bf16 mode (#14569) · 580dd87c

Stas Bekman authored Mar 11, 2022



* [WIP] add support for bf16 mode

* prep for bf16

* prep for bf16

* fix; zero2/bf16 is ok

* check bf16 is available

* test fixes

* enable zero3_bf16

* config files

* docs

* split stage_dtype; merge back to non-dtype-specific config file

* fix doc

* cleanup

* cleanup

* bfloat16 => bf16 to match the PR changes

* s/zero_gather_fp16_weights_on_model_save/zero_gather_16bit_weights_on_model_save/; s/save_fp16_model/save_16bit_model/

* test fixes/skipping

* move

* fix

* Update docs/source/main_classes/deepspeed.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* backticks

* cleanup

* cleanup

* cleanup

* new version

* add note about grad accum in bf16
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

580dd87c

02 Mar, 2022 1 commit
- fix deepspeed tests (#15881) · b842d727
  Stas Bekman authored Mar 01, 2022
```
* fix deepspeed tests

* style

* more fixes
```
  b842d727
23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

03 Feb, 2022 1 commit
- [deepspeed] fix a bug in a test (#15493) · 4f5faaf0
  Stas Bekman authored Feb 03, 2022
```
* [deepspeed] fix a bug in a test

* consistency
```
  4f5faaf0
13 Jan, 2022 1 commit
- [deepspeed tests] fix summarization (#15149) · 1eb40338
  Stas Bekman authored Jan 13, 2022
  
  1eb40338
07 Dec, 2021 1 commit

[deepspeed] fix --load_best_model_at_end (#14652) · b66c5ab2

Stas Bekman authored Dec 06, 2021

* [deepspeed] fix load_best_model_at_end

* try with pull_request_target

* revert: try with pull_request_target

* style

* add test

* cleanup

b66c5ab2

23 Nov, 2021 1 commit

[deepspeed] zero inference (#14253) · 956a4831

Stas Bekman authored Nov 23, 2021



* [deepspeed] zero inference

* only z3 makes sense for inference

* fix and style

* docs

* rework

* fix test

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* responding to suggestions
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

956a4831

11 Nov, 2021 1 commit
- solve the port conflict (#14362) · 1c76a516
  Stas Bekman authored Nov 10, 2021
  
  1c76a516
08 Nov, 2021 1 commit
- [deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set (#14331) · d0e96c6d
  Jeff Rasley authored Nov 08, 2021
```
* defer to DS_TEST_PORT if set

* style
Co-authored-by: Stas Bekman <stas@stason.org>
```
  d0e96c6d
30 Aug, 2021 1 commit

Use DS callable API to allow hf_scheduler + ds_optimizer (#13216) · 42f359d0

Olatunji Ruwase authored Aug 30, 2021



* Use DS callable API to allow hf_scheduler + ds_optimizer

* Preserve backward-compatibility

* Restore backward compatibility

* Tweak arg positioning

* Tweak arg positioning

* bump the required version

* Undo indent

* Update src/transformers/trainer.py

* style
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

42f359d0

23 Jul, 2021 1 commit
- [tests] fix logging_steps requirements (#12860) · 98364ea7
  Stas Bekman authored Jul 23, 2021
  
  98364ea7
14 Jul, 2021 1 commit
- non-native optimizers are mostly ok with zero-offload (#12690) · 5dd0c956
  Stas Bekman authored Jul 13, 2021
  
  5dd0c956
13 Jul, 2021 1 commit

[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477) · 78f5fe14

Stas Bekman authored Jul 13, 2021



* zero_to_fp32 tests

* args change

* remove unnecessary work

* use transformers.trainer_utils.get_last_checkpoint

* document the new features

* cleanup

* wip

* fix fsmt

* add bert

* cleanup

* add xlm-roberta

* electra works

* cleanup

* sync

* split off the model zoo tests

* cleanup

* cleanup

* cleanup

* cleanup

* reformat

* cleanup

* casing

* deepspeed>=0.4.3

* adjust distilbert

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

78f5fe14

22 Jun, 2021 1 commit
- [trainer] 2 bug fixes and a rename (#12309) · ebe54135
  Stas Bekman authored Jun 22, 2021
```
* bug fixes and a rename

* add extended DDP test
```
  ebe54135
08 Jun, 2021 2 commits

[Deepspeed Wav2vec2] integration (#11638) · 11d86d3d

Stas Bekman authored Jun 08, 2021

* wip

* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044

* cleanup

* workaround

* working 5/8 modes

* solve fp32 distributed zero3

* style

* sync

* sync

* rework

* deprecation

* cleanup

* https://github.com/microsoft/DeepSpeed/pull/1044

 pr was merged

* clean up

* add a guide

* more prose

* more prose

* fix

* more prose

* sub_group_size was too big

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor

* bug fix

* make the true check explicit

* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

11d86d3d

[Deepspeed] various fixes (#12058) · 32290d87

Stas Bekman authored Jun 08, 2021

* replace deprecated config

* sub_group_size was too big

* complete deprecation removal

32290d87

04 Jun, 2021 1 commit

[Deepspeed] Assert on mismatches between ds and hf args (#12021) · 2c73b930

Stas Bekman authored Jun 04, 2021



* wip

* add mismatch validation + test

* renames

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2c73b930

02 Jun, 2021 2 commits
- [deepspeed] add nvme test skip rule (#11997) · 61c50634
  Stas Bekman authored Jun 02, 2021
```
* add nvme skip rule

* fix
```
  61c50634
- [deepspeed] Move code and doc into standalone files (#11984) · 640318be
  Stas Bekman authored Jun 02, 2021
```
* move code and docs

* style

* moved

* restore
```
  640318be
01 Jun, 2021 1 commit

[DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966) · 7ec596ec

Stas Bekman authored Jun 01, 2021



* decouple DeepSpeedConfigHF from Trainer

* add LoggingLevel ctx manager; add new test

* cleanup

* add docs

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implemented suggested renames

* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7ec596ec

21 May, 2021 1 commit
- [Deepspeed] support `zero.Init` in `from_config` (#11805) · a26f4d62
  Stas Bekman authored May 21, 2021
```
* support zero.Init in from_config

* no need for eval test
```
  a26f4d62
06 May, 2021 1 commit
- [cuda ext tests] fixing tests (#11619) · 619200cc
  Stas Bekman authored May 06, 2021
```
* fixing tests

* cleanup
```
  619200cc
30 Apr, 2021 1 commit

[DeepSpeed] fp32 support (#11499) · 4e7bf94e

Stas Bekman authored Apr 30, 2021

* prep for deepspeed==0.3.16

* new version

* too soon

* support and test fp32 mode

* troubleshooting doc start

* workaround no longer needed

* add fp32 doc

* style

* cleanup, add tf32 note

* clarify

* release was made

4e7bf94e

26 Apr, 2021 3 commits

[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418) · bc2571e6

Stas Bekman authored Apr 26, 2021



* adding Z-inf

* revamp config process

* up version requirement

* wip

* massive rewrite

* cleanup

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* consistent json commas

* act on suggestions

* leave this feature for 0.3.16

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

bc2571e6

[Examples] Fixes inconsistency around eval vs val and predict vs test (#11380) · 1d30ec95

Bhadresh Savani authored Apr 26, 2021

* added changes for uniformity

* modified files

* corrected typo

* fixed qa scripts

* fix typos

* fixed predict typo in qa no trainer

* fixed test file

* reverted trainer changes

* reverted trainer changes in custom exmaples

* updated readme

* added changes in deepspeed test

* added changes for predict and eval

1d30ec95

make style (#11442) · 32dbb2d9
Patrick von Platen authored Apr 26, 2021

32dbb2d9

21 Apr, 2021 1 commit

Examples reorg (#11350) · dabeb152

Sylvain Gugger authored Apr 21, 2021



* Base move

* Examples reorganization

* Update references

* Put back test data

* Move conftest

* More fixes

* Move test data to test fixtures

* Update path

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

dabeb152

14 Apr, 2021 1 commit

[deepspeed] test on one node 2 gpus max (#11237) · 83206ca6

Stas Bekman authored Apr 14, 2021

* test on one node 2 gpus max

* fix the other place

* refactor

* fix

* cleanup

* more exact version

83206ca6

13 Apr, 2021 1 commit
- [Deepspeed] zero3 tests band aid (#11235) · 3d339ee6
  Stas Bekman authored Apr 13, 2021
```
* temp band-aid

* style
```
  3d339ee6
08 Apr, 2021 1 commit

[tests] relocate core integration tests (#11146) · 66446909

Stas Bekman authored Apr 08, 2021

* relocate core integration tests

* add sys.path context manager

* cleanup

* try

* try2

* fix path

* doc

* style

* add dep

* add 2 more deps

66446909