Commits · df8e6804c004903753d3e635d85f32694e3d2c39 · chenpangpang / transformers

"docs/source/zh/main_classes/output.md" did not exist on "84724efd101af52ed3d6af878e41ff8fd651a9cc"

22 Jun, 2022 2 commits
- Offload fixes (#17810) · df8e6804
  Sylvain Gugger authored Jun 22, 2022
```
* Offload fixes

* Add a test
```
  df8e6804
- initial commit (#17818) · 56b83cf0
  Arthur authored Jun 22, 2022
  
  56b83cf0
20 Jun, 2022 2 commits

Not use -1e4 as attn mask (#17306) · d3cb2888

Yih-Dar authored Jun 20, 2022



* Use torch.finfo(self.dtype).min

* for GPTNeoX

* for Albert

* For Splinter

* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix -inf used in Bart-like models

* Fix a few remaining -inf

* more fix

* clean up

* For CLIP

* For FSMT

* clean up

* fix test

* Add dtype argument and use it for LayoutLMv3

* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

d3cb2888

Fix cache for GPT-Neo-X (#17764) · fdb12080
Sylvain Gugger authored Jun 20, 2022
```
* Fix cache for GPT-Neo-X

* Add more tests
```
fdb12080

13 Jun, 2022 1 commit

Fix dtype getter (#17668) · a1344dbf

Sylvain Gugger authored Jun 13, 2022

* Fix dtype getters

* Proper fix for dtype getter

* Style and commant

* Always use last for consistency

* Quality

a1344dbf

10 Jun, 2022 1 commit
- Fix dtype getters (#17656) · b8809091
  Sylvain Gugger authored Jun 10, 2022
  
  b8809091
09 Jun, 2022 1 commit

[modeling_utils] torch_dtype/auto floating dtype fixes (#17614) · 75343de9

Stas Bekman authored Jun 09, 2022



* [modeling_utils] torch_dtype/auto fixes

* add test

* apply suggestions

* add missing fallback

* Renaming things

* Use for else
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

75343de9

03 Jun, 2022 1 commit
- Fix all offload and MP tests (#17533) · 83439012
  Sylvain Gugger authored Jun 03, 2022
  
  83439012
02 Jun, 2022 1 commit
- Fix when Accelerate is not installed (#17518) · 588d8f1f
  Sylvain Gugger authored Jun 02, 2022
  
  588d8f1f
31 May, 2022 1 commit
- Disk offload fix (#17428) · 567d9c06
  Sylvain Gugger authored May 31, 2022
```
* Fix offload to disk for big models

* Add test

* Fix test for other models
```
  567d9c06
25 May, 2022 1 commit
- Add test for new model parallelism features (#17401) · 31484afb
  Sylvain Gugger authored May 25, 2022
  
  31484afb
23 May, 2022 1 commit

Use Accelerate in `from_pretrained` for big model inference (#17341) · 56f50590

Sylvain Gugger authored May 23, 2022



* Initial work

* More or less finished with first draft

* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix randomly initialized weights

* Update src/transformers/modeling_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* Rename DeepSpeed folder to temporarily fix the test issue?

* Revert to try if Accelerate fix works

* Use latest Accelerate release

* Quality and fixes

* Style

* Quality

* Add doc

* Test + fix

* More blocks
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

56f50590

19 May, 2022 1 commit
- fix for 17292 (#17293) · 5d6feecf
  Nathan Dahlberg authored May 19, 2022
  
  5d6feecf
17 May, 2022 1 commit

Improve mismatched sizes management when loading a pretrained model (#17257) · 28a08116

regisss authored May 17, 2022

- Add --ignore_mismatched_sizes argument to classification examples

- Expand the error message when loading a model whose head dimensions are different from expected dimensions

28a08116

12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

03 May, 2022 3 commits
- Remove device parameter from create_extended_attention_mask_for_decoder (#16894) · 39f8eafc
  Pavel Belevich authored May 03, 2022
  
  39f8eafc
- Fix RNG reload in resume training from epoch checkpoint (#17055) · 1c9fcd0e
  Sylvain Gugger authored May 03, 2022
```
* Fix RNG reload in resume training from epoch checkpoint

* Fix test
```
  1c9fcd0e
- Make Trainer compatible with sharded checkpoints (#17053) · a8fa2f91
  Sylvain Gugger authored May 03, 2022
```
* Make Trainer compatible with sharded checkpoints

* Add doc
```
  a8fa2f91
29 Apr, 2022 1 commit
- Make create_extended_attention_mask_for_decoder static method (#16893) · 63fbed5c
  Pavel Belevich authored Apr 29, 2022
  
  63fbed5c
27 Apr, 2022 1 commit
- Fix multiple deletions of the same files in save_pretrained (#16947) · c79bbc3b
  Sylvain Gugger authored Apr 27, 2022
```
* Fix multiple deletions of the same files in save_pretrained

* Add is_main_process argument
```
  c79bbc3b
26 Apr, 2022 2 commits
- use original loaded keys to find mismatched keys (#16920) · 2d91e3c3
  Yongliang Shen authored Apr 27, 2022
  
  2d91e3c3
- Limit the use of PreTrainedModel.device (#16935) · 344b9fb0
  Sylvain Gugger authored Apr 25, 2022
```
* Limit the use of PreTrainedModel.device

* Fix
```
  344b9fb0
22 Apr, 2022 1 commit
- Minor fixes/improvements in `convert_file_size_to_int` (#16891) · 9fa88172
  Mario Šaško authored Apr 22, 2022
```
* Minor improvements to `convert_file_size_to_int`

* Add <unit>bit version to kilos and megas

* Minor fix
```
  9fa88172
20 Apr, 2022 1 commit
- [modeling_utils] use less cpu memory with sharded checkpoint loading (#16844) · afa1ef09
  Stas Bekman authored Apr 20, 2022
```
* less cpu memory with sharded checkpoint loading

* Trigger CI

* Trigger CI
```
  afa1ef09
19 Apr, 2022 1 commit
- [Typo] Fix typo in modeling utils (#16840) · e1c153cb
  Patrick von Platen authored Apr 19, 2022
  
  e1c153cb
15 Apr, 2022 1 commit

[modeling utils] revamp `from_pretrained(..., low_cpu_mem_usage=True)` + tests (#16657) · 5da33f87

Stas Bekman authored Apr 14, 2022

* add low_cpu_mem_usage tests

* wip: revamping

* wip

* install /usr/bin/time

* wip

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* fix assert

* put the wrapper back

* cleanup; switch to bert-base-cased

* Trigger CI

* Trigger CI

5da33f87

13 Apr, 2022 2 commits
- [modeling_utils] better explanation of ignore keys (#16741) · ac43a40e
  Stas Bekman authored Apr 13, 2022
  
  ac43a40e
- [from_pretrained] refactor find_mismatched_keys (#16706) · 12bfa97a
  Stas Bekman authored Apr 13, 2022
  
  12bfa97a
12 Apr, 2022 2 commits

Moved functions to pytorch_utils.py (#16625) · a315988b

Anmol Joshi authored Apr 12, 2022

* Moved functions to pytorch_utils.py

* isort formatting

* Reverted tf changes

* isort, make fix-copies

* documentation fix

* Fixed Conv1D import

* Reverted research examples file

* backward compatibility for pytorch_utils

* missing import

* isort fix

a315988b

Only call get_output_embeddings when tie_word_embeddings is set (#16667) · b9f12bed

smelm authored Apr 12, 2022



This avoids an unnecessary call and avoids problems during
initialization of class hierarchies.
Co-authored-by: Samuel Melm <samuel.melm@stud.uni-heidelberg.de>

b9f12bed

08 Apr, 2022 1 commit
- only load state dict when the checkpoint is not None (#16673) · f4d4f0a1
  Laura Hanu authored Apr 08, 2022
  
  f4d4f0a1
07 Apr, 2022 1 commit
- Updated _load_pretrained_model_low_mem to check if keys are in the state_dict (#16643) · 4099817b
  Francesco Saverio Zuppichini authored Apr 07, 2022
```
* Updated _load_pretrained_model_low_mem to check if keys are in the stored state_dict

* update after conversions
```
  4099817b
06 Apr, 2022 3 commits
- [modeling_utils] rearrange text (#16632) · 4d100835
  Stas Bekman authored Apr 06, 2022
  
  4d100835
- typo (#16621) · fb3d0df4
  Stas Bekman authored Apr 06, 2022
  
  fb3d0df4
- don't load state_dict twice when using low_cpu_mem_usage in from_pretrained (#16602) · 47c5c059
  Suraj Patil authored Apr 06, 2022
  
  47c5c059
05 Apr, 2022 2 commits
- handle torch_dtype in low cpu mem usage (#16580) · 21decb77
  Suraj Patil authored Apr 05, 2022
  
  21decb77
- made _load_pretrained_model_low_mem static + bug fix (#16548) · 8bf6d28c
  Francesco Saverio Zuppichini authored Apr 05, 2022
  
  8bf6d28c
04 Apr, 2022 1 commit
- Making the impossible to connect error actually report the right URL. (#16446) · 013a7dbe
  Nicolas Patry authored Apr 04, 2022
  
  013a7dbe
25 Mar, 2022 2 commits

Checkpoint sharding (#16343) · b473617d

Sylvain Gugger authored Mar 25, 2022



* Sharded checkpoint support

* Handle distant sharded checkpoints

* Add tests

* TODO is done

* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix docstring

* Add example and format

* Address review comments

* More review comments

* End of merge

* Revert unintentional change

* VsCode what did you do?

* Style

* Changes

* Address final comments

* Quality

* Moar tests

* Move import beneath is_pt_available
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

b473617d

Big file_utils cleanup (#16396) · 088c1880
Sylvain Gugger authored Mar 25, 2022
```
* Big file_utils cleanup

* This one still needs to be treated separately
```
088c1880