Commits · 4b919657313103f1ee903e32a9213b48e6433afe · chenpangpang / transformers

17 Feb, 2021 1 commit
- Factor out methods (#10215) · 4b919657
  Lysandre Debut authored Feb 17, 2021
  
  4b919657
16 Feb, 2021 5 commits
- [trainer] fix ignored columns logger (#10219) · e94d63f6
  Stas Bekman authored Feb 16, 2021
```
* [trainer] fix ignored columns logger

This PR fixes a confusing log entry that says:
```
  The following columns in the evaluation set don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: .
```
when everything is in order.

* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  e94d63f6
- fix add_token_positions fn (#10217) · 4210cd96
  Joe Davison authored Feb 16, 2021
  
  4210cd96
- Store FLOS as floats to avoid overflow. (#10213) · 7169d1ea
  Sylvain Gugger authored Feb 16, 2021
  
  7169d1ea
- set tgt_lang of MBart Tokenizer for summarization (#10205) · df1b0fb5
  Zhang Cheng authored Feb 16, 2021
  
  df1b0fb5
- Unlock XLA test for convbert (#10207) · 5c2d66a2
  Julien Plu authored Feb 16, 2021
  
  5c2d66a2
15 Feb, 2021 12 commits

[WIP][examples/seq2seq] move old s2s scripts to legacy (#10136) · 1c8c2d9a

Suraj Patil authored Feb 16, 2021



* move old s2s scripts to legacy

* add the tests back

* proper rename

* restore

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1c8c2d9a

make the sub-group of tests run always (#10196) · 96897a35
Stas Bekman authored Feb 15, 2021

96897a35

Specify dataset dtype (#10195) · 8cbd0bd1

Lysandre Debut authored Feb 15, 2021


Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

8cbd0bd1

fix run_seq2seq.py; porting trainer tests to it (#10162) · 0b1f552a

Stas Bekman authored Feb 15, 2021

* fix run_seq2seq.py; porting DeepSpeed tests to it

* unrefactor

* defensive programming

* defensive programming 2

* port the rest of the trainer tests

* style

* a cleaner scripts dir finder

* cleanup

0b1f552a

Add AMP for Albert (#10141) · 31b0560a
Julien Plu authored Feb 15, 2021

31b0560a

Add mBART-50 (#10154) · 6fc940ed

Suraj Patil authored Feb 15, 2021

* add tokenizer for mBART-50

* update tokenizers

* make src_lang and tgt_lang optional

* update tokenizer test

* add setter

* update docs

* update conversion script

* update docs

* update conversion script

* update tokenizer

* update test

* update docs

* doc

* address Sylvain's suggestions

* fix test

* fix formatting

* nits

6fc940ed

Fix TF template (#10189) · 57021887
Julien Plu authored Feb 15, 2021
```
* Fix template

* Update Seq2Seq tests
```
57021887
fix RagTokenizer (#10167) · 2a5c9900
Suraj Patil authored Feb 15, 2021

2a5c9900

Check TF ops for ONNX compliance (#10025) · c8d3fa0d

Julien Plu authored Feb 15, 2021



* Add check-ops script

* Finish to implement check_tf_ops and start the test

* Make the test mandatory only for BERT

* Update tf_ops folder

* Remove useless classes

* Add the ONNX test for GPT2 and BART

* Add a onnxruntime slow test + better opset flexibility

* Fix test + apply style

* fix tests

* Switch min opset from 12 to 10

* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix GPT2

* Remove extra shape_list usage

* Fix GPT2

* Address Morgan's comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

c8d3fa0d

Add new model to labels that should not stale (#10187) · 93bd2f70
Lysandre Debut authored Feb 15, 2021

93bd2f70
Fixing NER pipeline for list inputs. (#10184) · 900daec2
Nicolas Patry authored Feb 15, 2021
```
Fixes #10168
```
900daec2
Fix datasets set_format (#10178) · 587197dc
Sylvain Gugger authored Feb 15, 2021

587197dc

13 Feb, 2021 6 commits

[t5 tokenizer] add info logs (#9897) · 8fae93ca

Stas Bekman authored Feb 13, 2021

* save fast tokenizer + add info logs

* fix tests

* remove the saving of fast tokenizer

8fae93ca

[Doc] Fix version control in internal pages (#10124) · 80349831
Sylvain Gugger authored Feb 13, 2021

80349831
Fix typo in comment (#10156) · 698c9e2d
Manuel Romero authored Feb 13, 2021

698c9e2d
Fix typo in comments (#10157) · c9693668
Manuel Romero authored Feb 13, 2021

c9693668

Conversion from slow to fast for BPE spm vocabs contained an error. (#10120) · c9837a0d

Nicolas Patry authored Feb 13, 2021

* Conversion from slow to fast for BPE spm vocabs contained an error.

- There is only 1 test currently (tokenizers + slow) that used the modified path
and it's reformer, which does not contain any ids modification so the
bug was silent for now.
- The real issue is that vocab variable was overloaded by
SentencePieceExtractor, leading to Slow specific vocab oddities to be
completely ignored
- The bug was reported here https://github.com/huggingface/transformers/issues/9518
- Ran the complete tokenization test suite with slow without error
(`RUN_SLOW=1 pytest -sv tests/test_tokenization_*`)

* Remove rebase error.

* Adding the fixture.

c9837a0d

Revert propagation (#10171) · dd3a7f96
Lysandre Debut authored Feb 13, 2021

dd3a7f96

12 Feb, 2021 4 commits
- [hf_api] delete deprecated methods and tests (2) · 641f418e
  Julien Chaumond authored Feb 12, 2021
  
  641f418e
- [hf_api] delete deprecated methods and tests (#10159) · eed31db9
  Julien Chaumond authored Feb 12, 2021
```
* [hf_api] delete deprecated methods and tests

cc @lhoestq

* Update test_hf_api.py
```
  eed31db9
- Fix typo in GPT2DoubleHeadsModel docs (#10148) · 1321356b
  Mohamed Al Salti authored Feb 12, 2021
```
* Fix typo

* apply suggestion
Co-authored-by: Suraj Patil <surajp815@gmail.com>
```
  1321356b
- [examples/run_s2s] remove task_specific_params and update rouge computation (#10133) · f51188cb
  Suraj Patil authored Feb 12, 2021
```
* fix rouge metrics and task specific params

* fix typo

* round metrics

* typo

* remove task_specific_params
```
  f51188cb
11 Feb, 2021 8 commits

Add SageMakerTrainer for model paralellism (#10122) · 31245775

Sylvain Gugger authored Feb 11, 2021

* Refactor things out of main train

* Store signature

* Add SageMakerTrainer

* Init + Copyright

* Address review comments

31245775

[DeepSpeed in notebooks] Jupyter + Colab (#10130) · b54cb0bd

Stas Bekman authored Feb 11, 2021

* init devices/setup explicitly

* docs + test

* simplify

* cleanup

* cleanup

* cleanup

* correct the required dist setup

* derive local_rank from env LOCAL_RANK

b54cb0bd

Typo fix · 6710d1d5
Sylvain Gugger authored Feb 11, 2021

6710d1d5
Update README.md · 8e13b735
Patrick von Platen authored Feb 11, 2021

8e13b735
Update ADD_BIG_BIRD.md · d6b4f48e
Patrick von Platen authored Feb 11, 2021

d6b4f48e

[Wav2Vec2] Improve Tokenizer & Model for batched inference (#10117) · 495c157d

Patrick von Platen authored Feb 11, 2021

* save intermediate

* finish batch the same as fairseq

* add normalization

* fix batched input

* add better comment

* Update src/transformers/models/wav2vec2/modeling_wav2vec2.py

* add nice docstring

* add tokenizer tests

* make all slow tests pass

* finish PR

* correct import

495c157d

Add new community notebook - Blenderbot (#10126) · 2f3b5f4d

Tanmay Thakur authored Feb 11, 2021

* Update:community.md, new nb add

* feat: updated grammar on  nb description

* Update: Train summarizer for BlenderBotSmall

2f3b5f4d

Update run_xnli.py to use Datasets library (#9829) · 8dcfaea0

Qbiwan authored Feb 11, 2021

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* fix

* fix

* fix

* push

* fix

* everything works

* fix init

* fix

* special treatment for sepconv1d

* style

* 🙏🏽

* add doc and cleanup


* fix doc

* fix doc again

* fix doc again

* Apply suggestions from code review

* make style

* Proposal that should work

* Remove needless code

* Fix test

* Apply suggestions from code review

* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric

* amend README

* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.

* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()

* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README

8dcfaea0

10 Feb, 2021 4 commits

[DeepSpeed] restore memory for evaluation (#10114) · 77b86284
Stas Bekman authored Feb 10, 2021
```
* free up memory at the end of train

* rework tests

* consistent formatting

* correction
```
77b86284

remove adjust_logits_during_generation method (#10087) · c130e67d

Suraj Patil authored Feb 10, 2021

* add forced logits processors

* delete adjust_logits method

* add forced_eos_token_id argument in config

* add tests for forced logits processors

* update gen utils tests

* add forced option to tf generate

* remove adjust_logits method from tf models

* update adjust_logits for marian

* delete _force_token_id_to_be_generated method

* style

* import warnings

* pass max_length to _get_logits_processor

* set forced_eos_token_id to None

* set forced attributes in conf utils

* typo

* fix rag generate

* add forced_eos_token_id in rag config

* remove force_bos_token_to_be_generated from BartConfig

* remove _force_token_ids_generation from FSMT

* nit

* fix negative constant

* apply suggestions from code review

c130e67d

Fix TF LED/Longformer attentions computation (#10007) · 22a32cf4

Julien Plu authored Feb 10, 2021

* Fix test

* Remove commented test

* Fix name

* Apply style

* Fix check copies

* Remove prints

* Restore boolean

* Fix reshape

22a32cf4

Line endings should be LF across repo and not CRLF (#10119) · 0d8e554d
Lysandre Debut authored Feb 10, 2021

0d8e554d