Commits · 51ee20fc26381ca8aba4d4da9b410379302ee1d1 · chenpangpang / transformers

14 Oct, 2021 1 commit
- Remove wrong model_args supplied (#13937) · 51ee20fc
  Li-Huai (Allan) Lin authored Oct 14, 2021
```
* Remove wrong model_args of config.from_pretrained

* Fix tf & flax
```
  51ee20fc
11 Oct, 2021 1 commit

[Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer... · dca67968

Patrick von Platen authored Oct 11, 2021

[Gradient checkpoining] Correct disabling `find_unused_parameters` in Trainer when gradient checkpointing is enabled (#13961)

* up

* correct test

dca67968

08 Oct, 2021 1 commit

Adds `PreTrainedModel.framework` attribute (#13817) · de344815

Stella Biderman authored Oct 08, 2021



* Added `framework` attribute

* Update modeling_utils.py

* Update modeling_flax_utils.py

* Update modeling_tf_utils.py

* Update modeling_utils.py

* Update modeling_tf_utils.py

* Update modeling_tf_utils.py

* Update modeling_flax_utils.py

* Update modeling_tf_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_tf_utils.py

* Update modeling_flax_utils.py

* string -> str

* Update modeling_tf_utils.py

* string -> str

* fixup

* make flake happy
Co-authored-by: patil-suraj <surajp815@gmail.com>

de344815

07 Oct, 2021 2 commits
- Add missing character (#13922) · 5f34163b
  Mishig Davaadorj authored Oct 07, 2021
  
  5f34163b
- Add missing whitespace to multiline strings (#13916) · 57420b10
  Alex Hedges authored Oct 07, 2021
  
  57420b10
05 Oct, 2021 1 commit
- Improve error message when loading models from Hub (#13836) · 46efc580
  Alex Hedges authored Oct 05, 2021
```
* Improve error message when loading models from Hub

* Adjust error message wording
```
  46efc580
24 Sep, 2021 1 commit

Make assertions only if actually chunking forward (#13598) · 678bb248

Josh Devins authored Sep 24, 2021

This moves the assertion on checking input dimensions into a block that will only be called if the function is actually going to do chunking forward. This is often not the case at inference time and PyTorch tracing a model with this assertion in it leads to a tracing warning.

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors

678bb248

23 Sep, 2021 1 commit

1x model size CPU memory usage for `from_pretrained` (#13466) · 62832c96

Stas Bekman authored Sep 22, 2021

* one possible solution

* low mem from_pretrained

* edge cases

* solve the persistent buffers

* style

* parametrize

* for later

* proper solution

* cleanup

* refactor; rework based on suggestions

* revert splitting into 2 parts, move checks into main func

62832c96

22 Sep, 2021 1 commit

Make gradient_checkpointing a training argument (#13657) · 27d46397

Sylvain Gugger authored Sep 22, 2021



* Make gradient_checkpointing a training argument

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix tests

* Style

* document Gradient Checkpointing as a performance feature

* Small rename

* PoC for not using the config

* Adapt BC to new PoC

* Forgot to save

* Rollout changes to all other models

* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>

27d46397

17 Sep, 2021 1 commit
- Use `config_dict_or_path` for deepspeed.zero.Init (#13614) · ce32c69c
  Alex Hedges authored Sep 17, 2021
  
  ce32c69c
16 Sep, 2021 1 commit
- [deepspeed] replaced deprecated init arg (#13587) · bec2e3f5
  Stas Bekman authored Sep 16, 2021
```
* [deepspeed] replaced deprecated init arg

* Trigger CI
```
  bec2e3f5
15 Sep, 2021 1 commit
- [Pretrained Model] Add resize_position_embeddings (#13559) · 95f933ea
  Patrick von Platen authored Sep 15, 2021
```
* finish

* delete bogus file

* correct some stuff

* finish

* finish
```
  95f933ea
08 Sep, 2021 1 commit
- Better error raised when cloned without lfs (#13401) · 99029ab6
  Lysandre Debut authored Sep 08, 2021
```
* Better error raised when cloned without lfs

* add from e
```
  99029ab6
30 Aug, 2021 1 commit
- fix: typo spelling grammar (#13212) · 01977466
  arfy slowy authored Aug 30, 2021
```
* fix: typo spelling grammar

* fix: make fixup
```
  01977466
26 Aug, 2021 1 commit

Add error message concerning revision (#13266) · 401377e6

Bram Vanroy authored Aug 26, 2021



* add error message concerning revision

* Update src/transformers/configuration_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* re-add double line endings

* is not None instead of implicit bool casting
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

401377e6

06 Aug, 2021 1 commit

Tpu tie weights (#13030) · 7fcee113

Sylvain Gugger authored Aug 06, 2021

* Fix tied weights on TPU

* Manually tie weights in no trainer examples

* Fix for test

* One last missing

* Gettning owned by my scripts

* Address review comments

* Fix test

* Fix tests

* Fix reformer tests

7fcee113

04 Aug, 2021 1 commit

Fix from_pretrained with corrupted state_dict (#12939) · d4c834d2

Sylvain Gugger authored Aug 04, 2021

* Fix from_pretrained with corrupted state_dict

* Adapt test

* Use better checkpoint

* Style

* Clean up

d4c834d2

17 Jul, 2021 1 commit
- Fix push_to_hub docstring and make it appear in doc (#12770) · da72ac6e
  Sylvain Gugger authored Jul 17, 2021
  
  da72ac6e
13 Jul, 2021 3 commits

[Deepspeed] adapt multiple models, add zero_to_fp32 tests (#12477) · 78f5fe14

Stas Bekman authored Jul 13, 2021



* zero_to_fp32 tests

* args change

* remove unnecessary work

* use transformers.trainer_utils.get_last_checkpoint

* document the new features

* cleanup

* wip

* fix fsmt

* add bert

* cleanup

* add xlm-roberta

* electra works

* cleanup

* sync

* split off the model zoo tests

* cleanup

* cleanup

* cleanup

* cleanup

* reformat

* cleanup

* casing

* deepspeed>=0.4.3

* adjust distilbert

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

78f5fe14

Fix minor docstring typos. (#12682) · 711d901c
qqaatw authored Jul 14, 2021

711d901c

Add option to load a pretrained model with mismatched shapes (#12664) · 90178b0c

Sylvain Gugger authored Jul 13, 2021



* Add option to load a pretrained model with mismatched shapes

* Fail at loading when mismatched shapes in Flax

* Fix tests

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

90178b0c

08 Jul, 2021 1 commit

[model.from_pretrained] raise exception early on failed load (#12574) · f0dde601

Stas Bekman authored Jul 08, 2021




* [model.from_pretrained] raise exception early on failed load

Currently if `load` pretrained weights fails in `from_pretrained`, we first print a whole bunch of successful messages and then fail - this PR puts the exception first to avoid all the misleading messages.

* style
Co-authored-by: Suraj Patil <surajp815@gmail.com>

f0dde601

01 Jul, 2021 2 commits

[roberta] fix lm_head.decoder.weight ignore_key handling (#12446) · 2d1d9218

Stas Bekman authored Jul 01, 2021



* fix lm_head.decoder.weight ignore_key handling

* fix the mutable class variable

* Update src/transformers/models/roberta/modeling_roberta.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* replicate the comment

* make deterministic
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2d1d9218

Fixing bug with param count without embeddings (#12461) · 7f0027db

Teven authored Jul 01, 2021



* fixing bug with param count without embeddings

* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7f0027db

29 Jun, 2021 1 commit

[models] respect dtype of the model when instantiating it (#12316) · 7682e977

Stas Bekman authored Jun 28, 2021



* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7682e977

23 Jun, 2021 1 commit

Clean push to hub API (#12187) · 53c60bab

Sylvain Gugger authored Jun 23, 2021



* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

53c60bab

14 Jun, 2021 1 commit
- [style] consistent nn. and nn.functional (#12124) · 1ed2ebf6
  Stas Bekman authored Jun 14, 2021
```
* consistent nn. and nn.functional

* fix glitch

* fix glitch #2
```
  1ed2ebf6
02 Jun, 2021 1 commit
- [deepspeed] Move code and doc into standalone files (#11984) · 640318be
  Stas Bekman authored Jun 02, 2021
```
* move code and docs

* style

* moved

* restore
```
  640318be
12 May, 2021 2 commits

[Lazy init] Force fall back to slow init for composite models (#11705) · fd6204b2

Patrick von Platen authored May 12, 2021



* fix encoder-decoder & RAG

* finalize

* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/rag/modeling_rag.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fd6204b2

remove defaults to None if optional (#11703) · 77f4c46b
Philip May authored May 12, 2021

77f4c46b

06 May, 2021 1 commit
- fix tests (#11615) · 44c5621d
  Patrick von Platen authored May 06, 2021
  
  44c5621d
05 May, 2021 1 commit

Pytorch - Lazy initialization of models (#11471) · 3e3e41ae

Patrick von Platen authored May 05, 2021



* lazy_init_weights

* remove ipdb

* save int

* add necessary code

* remove unnecessary utils

* Update src/transformers/models/t5/modeling_t5.py

* clean

* add tests

* correct

* finish tests

* finish tests

* fix some more tests

* fix xlnet & transfo-xl

* fix more tests

* make sure tests are independent

* fix tests more

* finist tests

* final touches

* Update src/transformers/modeling_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* clean tests

* give arg positive name

* add more mock weights to xlnet
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

3e3e41ae

03 May, 2021 1 commit
- fix resize_token_embeddings (#11572) · 7c622482
  Stas Bekman authored May 03, 2021
  
  7c622482
30 Apr, 2021 1 commit

[DeepSpeed] fp32 support (#11499) · 4e7bf94e

Stas Bekman authored Apr 30, 2021

* prep for deepspeed==0.3.16

* new version

* too soon

* support and test fp32 mode

* troubleshooting doc start

* workaround no longer needed

* add fp32 doc

* style

* cleanup, add tf32 note

* clarify

* release was made

4e7bf94e

26 Apr, 2021 2 commits

[Deepspeed] ZeRO-Infinity integration plus config revamp (#11418) · bc2571e6

Stas Bekman authored Apr 26, 2021



* adding Z-inf

* revamp config process

* up version requirement

* wip

* massive rewrite

* cleanup

* cleanup

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* consistent json commas

* act on suggestions

* leave this feature for 0.3.16

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

bc2571e6

fix some typos in docs, comments, logging/errors (#11432) · b24ead87
LSinev authored Apr 26, 2021

b24ead87

23 Apr, 2021 2 commits

Trainer push to hub (#11328) · bf2e0cf7

Sylvain Gugger authored Apr 23, 2021



* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

bf2e0cf7

[Flax] Big FlaxBert Refactor (#11364) · 8c9b5fcb

Patrick von Platen authored Apr 23, 2021

* improve flax

* refactor

* typos

* Update src/transformers/modeling_flax_utils.py

* Apply suggestions from code review

* Update src/transformers/modeling_flax_utils.py

* fix typo

* improve error tolerance

* typo

* correct nasty saving bug

* fix from pretrained

* correct tree map

* add note

* correct weight tying

8c9b5fcb

14 Apr, 2021 1 commit

[troubleshooting] add 2 points of reference to the offline mode (#11236) · 63ca4023

Stas Bekman authored Apr 14, 2021



* add 2 points of reference to the offline mode

* link the new doc

* add error message

* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* rename

* Trigger CI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

63ca4023

08 Apr, 2021 1 commit

[DeepSpeed] ZeRO Stage 3 (#10753) · c6d66484

Stas Bekman authored Apr 08, 2021



* synced gpus

* fix

* fix

* need to use t5-small for quality tests

* notes

* complete merge

* fix a disappearing std stream problem

* start zero3 tests

* wip

* tune params

* sorting out the pre-trained model loading

* reworking generate loop wip

* wip

* style

* fix tests

* split the tests

* refactor tests

* wip

* parameterized

* fix

* workout the resume from non-ds checkpoint pass + test

* cleanup

* remove no longer needed code

* split getter/setter functions

* complete the docs

* suggestions

* gpus and their compute capabilities link

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* style

* remove invalid paramgd

* automatically configure zero3 params that rely on hidden size

* make _get_resized_embeddings zero3-aware

* add test exercising resize_token_embeddings()

* add docstring
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

c6d66484