Commits · d718c0c3a887bcab6acc151b3654bf9f46e61d62 · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "4f403ea8994ee8785aa73957c827938e74cf0fe3"

02 Feb, 2022 4 commits

[Wav2Vec2ProcessorWithLM] add alpha & beta to batch decode & decode (#15465) · d718c0c3
Patrick von Platen authored Feb 02, 2022

d718c0c3

Add option to resize like torchvision's Resize (#15419) · 1d94d575

NielsRogge authored Feb 02, 2022

* Add torchvision's resize

* Rename torch_resize to default_to_square

* Apply suggestions from code review

* Add support for default_to_square and tuple of length 1

1d94d575

Update tutorial docs (#15165) · b9418a1d

Steven Liu authored Feb 01, 2022

* first draft of pipeline, autoclass, preprocess tutorials

* apply review feedback

* 🖍 apply feedback from patrick/niels

* 📝add output image to preprocessed image

* 🖍 apply feedback from patrick

b9418a1d

Update fine-tune docs (#15259) · c157c7e3

Steven Liu authored Feb 01, 2022

* add fine-tune tutorial

* make edits, fix style

* 📝 make edits

* 🖍 fix code format links to external libraries

* 🔄revert code formatting

* 🖍 use DefaultDataCollator instead of DataCollatorWithPadding

c157c7e3

01 Feb, 2022 11 commits

Harder check for IndexErrors in QA scripts (#15438) · d0b5ed11
Sylvain Gugger authored Feb 01, 2022
```
* Harder check for IndexErrors in QA scripts

* Make test stronger
```
d0b5ed11
`Trainer.push_to_hub` always tries to push to the Hub (#15463) · 8e5d4e49
Sylvain Gugger authored Feb 01, 2022

8e5d4e49
[BartTokenizer] remove inheritance on RobertaTokenizer (#15461) · 37800f13
Suraj Patil authored Feb 01, 2022
```
* refactor bart tokenizers

* doc

* replace assert with ValueError
```
37800f13

use mean instead of elementwise_mean in XLMPredLayer (#15436) · f427e750

Yih-Dar authored Feb 01, 2022



* use mean instead of elementwise_mean

* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

f427e750

fix the `tokenizer_config.json` file for the slow tokenizer when a fast... · 7b8bdd86

SaulLu authored Feb 01, 2022

fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available (#15319)

* add new test

* update test

* remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`

* add `tokenizer_file` for the fast only tokenizer

* change global variables layoutxml

* remove `"tokenizer_file"` from DPR tokenizer's Global variables

* remove `tokenizer_file` from herbert slow tokenizer init

* `"tokenizer_file"` from LED tokenizer's Global variables

* remove `tokenizer_file` from mbart slow tokenizer init

* remove `tokenizer_file` from slow tokenizer template

* adapt to versioning

* adapt the `test_tokenizer_mismatch_warning` test

* clean test

* clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py

* Revert "remove `tokenizer_file` from mbart slow tokenizer init"

This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1.

* Revert "`"tokenizer_file"` from LED tokenizer's Global variables"

This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2.

* Revert "remove `tokenizer_file` from herbert slow tokenizer init"

This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd.

* Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"

This reverts commit da0895330bedfafc81ae3073470a9348c669f032.

* set `tokenizer_file` in super `__init__` of mbart

7b8bdd86

replace assert with exception for padding_side arg in `PreTrainedTokenizerBase` `__init__` (#15454) · 6d585fe0

SaulLu authored Feb 01, 2022

* replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`

* add test

* fix kwargs

* reformat test

* format

* format

* fix typo to render the documentation

6d585fe0

Update README.md (#15462) · d2749cf7
Kamal Raj authored Feb 01, 2022
```
fix typo
```
d2749cf7
[M2M100, XGLM] fix positional emb resize (#15444) · 1c9648c4
Suraj Patil authored Feb 01, 2022

1c9648c4
fix from_vision_text_pretrained doc example (#15453) · 2ca62683
Yih-Dar authored Feb 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
2ca62683

Fix TF Causal LM models' returned logits (#15256) · dc05dd53

Yih-Dar authored Feb 01, 2022



* Fix TF Causal LM models' returned logits

* Fix expected shape in the tests
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

dc05dd53

remove "inputs" in tf common test script (no longer required) (#15262) · af5c3329
Yih-Dar authored Feb 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
af5c3329

31 Jan, 2022 25 commits
- [generate] fix synced_gpus default (#15446) · d12ae816
  Stas Bekman authored Jan 31, 2022
  
  d12ae816
- skip test for XGLM (#15445) · d4f201b8
  Suraj Patil authored Jan 31, 2022
  
  d4f201b8
- Error when group_by_length is used with an IterableDataset (#15437) · 0c17e766
  Sylvain Gugger authored Jan 31, 2022
  
  0c17e766
- Update modeling_wav2vec2.py (#15423) · 125a2882
  peregilk authored Jan 31, 2022
```
* Update modeling_wav2vec2.py

With very tiny sound files (less than 0.1 seconds) the num_masked_span can be too long. The issue is described in issue #15366 and discussed with @patrickvonplaten.

* correct errors with mask time indices

* remove bogus file

* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  125a2882
- Add 'with torch.no_grad()' to BEiT integration test forward passes (#14961) · d984b103
  Tavin Turner authored Jan 31, 2022
```
* Add 'with torch.no_grad()' to BEiT integration test forward pass

* Fix inconsistent use of tabs and spaces in indentation
```
  d984b103
- Misfiring tf warnings (#15442) · 09f9d072
  Matt authored Jan 31, 2022
```
* Fix spurious warning in TF TokenClassification models

* Fixing one last spurious warning

* Removing outdated warning altogether
```
  09f9d072
- [RobertaTokenizer] remove inheritance on GPT2Tokenizer (#15429) · 6915174e
  Suraj Patil authored Jan 31, 2022
```
* refactor roberta tokenizer

* refactor fast tokenizer

* remove old comment
```
  6915174e
- correct positionla emb size (#15441) · a5ecbf73
  Suraj Patil authored Jan 31, 2022
  
  a5ecbf73
- Fix TFLEDModel (#15356) · 5a709873
  Yih-Dar authored Jan 31, 2022
```
* fix tf led

* fix

* fix

* Add test_pt_tf_model_equivalence_extra for TFLED

* add a (temporary) test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  5a709873
- [examples/Flax] add a section about GPUs (#15198) · 87918d32
  Suraj Patil authored Jan 31, 2022
```
* add a section about GPUs

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  87918d32
- [Trainer] suppress warning for length-related columns (#15421) · b8810847
  Patrick von Platen authored Jan 31, 2022
```
* [Trainer] suppress warning for length-related columns

* improve message

* Update src/transformers/trainer.py
```
  b8810847
- Change REALM checkpoint to new ones (#15439) · 3385ca25
  Sylvain Gugger authored Jan 31, 2022
```
* Change REALM checkpoint to new ones

* Last checkpoint missing
```
  3385ca25
- Fix spurious warning in TF TokenClassification models (#15435) · 7e56ba28
  Matt authored Jan 31, 2022
  
  7e56ba28
- Fix loss calculation in TFXXXForTokenClassification models (#15294) · 554d333e
  Yih-Dar authored Jan 31, 2022
```
* Fix loss calculation in TFFunnelForTokenClassification

* revert the change in TFFunnelForTokenClassification

* fix FunnelForTokenClassification loss

* fix other TokenClassification loss

* fix more

* fix more

* add num_labels to ElectraForTokenClassification

* revert the change to research projects
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  554d333e
- [deepspeed doc] fix import, extra notes (#15400) · 44c7857b
  Stas Bekman authored Jan 31, 2022
```
* [deepspeed doc] fix import, extra notes

* typo
```
  44c7857b
- Add header (#15434) · 47df0f22
  NielsRogge authored Jan 31, 2022
  
  47df0f22
- Add doc for add-new-model-like command (#15433) · 7fc6f41d
  Sylvain Gugger authored Jan 31, 2022
  
  7fc6f41d
- add t5 ner finetuning (#15432) · 282ae123
  Ogundepo Odunayo authored Jan 31, 2022
  
  282ae123
- [Hotfix] Fix Swin model outputs (#15414) · d4b3e56d
  NielsRogge authored Jan 31, 2022
```
* Fix Swin model outputs

* Rename pooler
```
  d4b3e56d
- import torch.utils.checkpoint (#15427) · 38dfb40a
  Suraj Patil authored Jan 31, 2022
  
  38dfb40a
- [Robust Speech Challenge] Add missing LR parameter (#15428) · f624249d
  Jonatas Grosman authored Jan 31, 2022
  
  f624249d
- Update README.md (#15430) · 3254080d
  Kamal Raj authored Jan 31, 2022
```
fix typo
```
  3254080d
- Add (M)Luke model training for Token Classification in the examples (#14880) · aa19f478
  Julien Plu authored Jan 31, 2022
```
* Add Luke training

* Fix true label tags

* Fix true label tags

* Fix true label tags

* Update the data collator for Luke

* Some training refactor for Luke

* Improve data collator for Luke

* Fix import

* Fix datasets concatenation

* Add the --max_entity_length argument for Luke models

* Remove unused code

* Fix style issues

* Fix style issues

* Move the Luke training into a separate folder

* Fix style

* Fix naming

* Fix filtering

* Fix filtering

* Fix filter

* Update some preprocessing

* Move luke to research_projects

* Checkstyle

* Address comments

* Fix style
```
  aa19f478
- Fix additional DataTrainingArguments documentation (#15408) · 0094eba3
  François REMY authored Jan 31, 2022
```
(This is an editorial change only)
```
  0094eba3
- Add SegformerFeatureExtractor to Auto API (#15410) · ee5de663
  NielsRogge authored Jan 31, 2022
  
  ee5de663