Commits · 7682e97702e4317231b3afe92359de384dba1e20 · chenpangpang / transformers

29 Jun, 2021 1 commit

[models] respect dtype of the model when instantiating it (#12316) · 7682e977

Stas Bekman authored Jun 28, 2021



* [models] respect dtype of the model when instantiating it

* cleanup

* cleanup

* rework to handle non-float dtype

* fix

* switch to fp32 tiny model

* improve

* use dtype.is_floating_point

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the doc

* recode to use explicit torch_dtype_auto_detect, torch_dtype args

* docs and tweaks

* docs and tweaks

* docs and tweaks

* merge 2 args, add docs

* fix

* fix

* better doc

* better doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7682e977

28 Jun, 2021 13 commits

[Flax] Add T5 pretraining script (#12355) · 31c3e7e7

Patrick von Platen authored Jun 28, 2021



* fix_torch_device_generate_test

* remove @

* add length computatan

* finish masking

* finish

* upload

* fix some bugs

* finish

* fix dependency table

* correct tensorboard

* Apply suggestions from code review

* correct processing

* slight change init

* correct some more mistakes

* apply suggestions

* improve readme

* fix indent

* Apply suggestions from code review
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* correct tokenizer

* finish

* finish

* finish

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

31c3e7e7

pass the matching trainer log level to deepspeed (#12401) · e2770748
Stas Bekman authored Jun 28, 2021

e2770748

Tensorflow LM examples (#12358) · 7e22609e

Matt authored Jun 28, 2021

* Tensorflow MLM example

* Add CLM example

* Style fixes, adding missing checkpoint code from the CLM example

* Fix TPU training, avoid massive dataset warnings

* Fix incorrect training length calculation for multi-GPU training

* Fix incorrect training length calculation for multi-GPU training

* Refactors and nitpicks from the review

* Style pass

* Adding README

7e22609e

[Flax] Adapt flax examples to include `push_to_hub` (#12391) · 2d70c912

Patrick von Platen authored Jun 28, 2021



* fix_torch_device_generate_test

* remove @

* finish

* correct summary writer

* correct push to hub

* fix indent

* finish

* finish

* finish

* finish

* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

2d70c912

Remove the need for `einsum` in Albert's attention computation (#12394) · a7d0b288
Funtowicz Morgan authored Jun 28, 2021
```
* debug albert einsum

* Fix matmul computation

* Let's use torch linear layer.

* Style.
```
a7d0b288
Fix copies · 276bc149
Sylvain Gugger authored Jun 28, 2021

276bc149
Update README.md · 27b6ac46
Patrick von Platen authored Jun 28, 2021

27b6ac46

[Flax community event] Add more description to readme (#12398) · 89b57a66

Patrick von Platen authored Jun 28, 2021



* fix_torch_device_generate_test

* remove @

* boom boom

* correct typos

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>

* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>

89b57a66

[Examples] Added context manager to datasets map (#12367) · 04dbea31

Bhadresh Savani authored Jun 28, 2021

* added cotext manager to datasets map

* fixed style and spaces

* fixed warning of deprecation

* changed desc

04dbea31

[CI] add dependency table sync verification (#12364) · d25ad34c

Stas Bekman authored Jun 28, 2021

* add dependency table sync verification

* improve the message

* improve the message

* revert

* ready to merge

d25ad34c

Add possibility to maintain full copies of files (#12312) · 57461ac0
Sylvain Gugger authored Jun 28, 2021

57461ac0

Update run_mlm.py (#12344) · 9490d668

Taha ValizadehAslani authored Jun 28, 2021

Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.

9490d668

[Documentation] Warn that DataCollatorForWholeWordMask is limited to... · c7faf2cc

Kilian Kluge authored Jun 28, 2021

[Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers (#12371)

* Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers

* Fix code formatting

c7faf2cc

26 Jun, 2021 2 commits
- replace print with logger (#12368) · ff5cdc08
  Bhadresh Savani authored Jun 26, 2021
  
  ff5cdc08
- updated example template (#12365) · 9a754594
  Bhadresh Savani authored Jun 26, 2021
  
  9a754594
25 Jun, 2021 10 commits

[Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359) · 539ee456
Bhadresh Savani authored Jun 25, 2021
```
* added log_level

* fix comment

* fixed log_level

* Trigger CI

* Unfied logging

* simplified args for log_level
```
539ee456
[trainer] add main_process_first context manager (#12351) · 64e60980
Stas Bekman authored Jun 25, 2021
```
* main_process_first context manager

* handle multi-node, add context description

* sync desc
```
64e60980

fixed multiplechoice tokenization (#12362) · f8664258

cronoik authored Jun 25, 2021

* fixed multiplechoice tokenization

The model would have seen two sequences:
1. [CLS]prompt[SEP]prompt[SEP]
2. [CLS]choice0[SEP]choice1[SEP]
that is not correct as we want a contextualized embedding of prompt and choice

* removed outer brackets for proper sequence generation

f8664258

remove extra white space from log format (#12360) · 4a872cae
Stas Bekman authored Jun 25, 2021

4a872cae
Style · a3daabfe
Sylvain Gugger authored Jun 25, 2021

a3daabfe
Replace NotebookProgressReporter by ProgressReporter in Ray Tune run (#12357) · 238521b0
Kai Fricke authored Jun 25, 2021
```
* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run

* Move to local import
```
238521b0

Add FlaxBigBird QuestionAnswering script (#12233) · 332a2458

Vasudev Gupta authored Jun 25, 2021

* port bigbird script

* adapt script a bit

* change location

* adapt more

* save progress

* init commit

* style

* dataset script tested

* readme add

332a2458

Fix exception in prediction loop occurring for certain batch sizes (#12350) · 55bb4c06

jglaser authored Jun 25, 2021



* fix distributed_concat for scalar outputs

* Update README.md

* fixed typo (#12356)

* simplify fix with terser syntax
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Trigger CI
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

55bb4c06

fixed typo (#12356) · d4ce31e8
michal pitr authored Jun 25, 2021

d4ce31e8
Update README.md · aa550c4a
Patrick von Platen authored Jun 25, 2021

aa550c4a

24 Jun, 2021 5 commits
- Add flax/jax quickstart (#12342) · f2c4ce7e
  Marc van Zee authored Jun 24, 2021
  
  f2c4ce7e
- Document patch release v4.8.1 · 5b1b5635
  Sylvain Gugger authored Jun 24, 2021
  
  5b1b5635
- Fix torchscript tests (#12336) · 8ef62ec9
  Lysandre Debut authored Jun 24, 2021
```
* Fix torchscript tests

* Better test

* Remove bogus print
```
  8ef62ec9
- [examples/Flax] move the examples table up (#12341) · aef3823e
  Suraj Patil authored Jun 24, 2021
  
  aef3823e
- try-this (#12338) · 7875b638
  Richard Liaw authored Jun 24, 2021
```
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
```
  7875b638
23 Jun, 2021 9 commits
- Fix default to logging_dir lost in merge conflict · cf3c9198
  Sylvain Gugger authored Jun 23, 2021
  
  cf3c9198
- [Deepspeed] new docs (#12077) · 07ae6103
  Stas Bekman authored Jun 23, 2021
```
* document sub_group_size

* style

* install + issues reporting

* style

* style

* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* indent 4

* restore

* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  07ae6103
- Update training_args.py (#12328) · 3694484d
  Sam Havens authored Jun 23, 2021
```
mention in `save_strategy` param description that `load_best_model_at_end` can override
```
  3694484d
- v4.9.0.dev0 · 2150dfed
  Sylvain Gugger authored Jun 23, 2021
  
  2150dfed
- Release: v4.8.0 · 9252a512
  Sylvain Gugger authored Jun 23, 2021
  
  9252a512
- [Flax T5] Fix weight initialization and fix docs (#12327) · 468cda20
  Patrick von Platen authored Jun 23, 2021
```
* finish t5 flax fixes

* improve naming
```
  468cda20
- Pin good version of huggingface_hub · 12a4457c
  Sylvain Gugger authored Jun 23, 2021
  
  12a4457c
- changed modeling_fx_utils.py to utils/fx.py for clarity (#12326) · 986ac03e
  Michael Benayoun authored Jun 23, 2021
```
Co-authored-by: Michael Benayoun <michael@huggingface.co>
```
  986ac03e
- Temporarily revert the `fill-mask` improvements. · 941b4442
  Lysandre authored Jun 23, 2021
  
  941b4442