Commits · 3a40cdf58d8b7233e561d1bf9e76f2ed02d9c4b7 · chenpangpang / transformers

23 Oct, 2020 2 commits

[tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers... · 3a40cdf5

Thomas Wolf authored Oct 23, 2020


[tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups (#7970)

* WIP refactoring pipeline tests - switching to fast tokenizers

* fix dialog pipeline and fill-mask

* refactoring pipeline tests backbone

* make large tests slow

* fix tests (tf Bart inactive for now)

* fix doc...

* clean up for merge

* fixing tests - remove bart from summarization until there is TF

* fix quality and RAG

* Add new translation pipeline tests - fix JAX tests

* only slow for dialog

* Fixing the missing TF-BART imports in modeling_tf_auto

* spin out pipeline tests in separate CI job

* adding pipeline test to CI YAML

* add slow pipeline tests

* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/pipelines.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add require_torch and require_tf in is_pt_tf_cross_test
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

3a40cdf5

Handle the case when title is None (#7941) · 88b3a91e
Lalit Pagaria authored Oct 23, 2020

88b3a91e

22 Oct, 2020 22 commits

[s2s trainer] tests to use distributed on multi-gpu machine (#7965) · 023f0f37
Stas Bekman authored Oct 22, 2020

023f0f37
change zero shot widget default example (#7992) · 64b24bb3
Joe Davison authored Oct 22, 2020

64b24bb3
Move NoLayerEmbedTokens (#7945) · 0397619a
Sam Shleifer authored Oct 22, 2020
```
* Move NoLayerEmbedTokens

* TFWrappedEmbeddings

* Add comment
```
0397619a
[gh ci] less output ( --durations=50) (#7989) · 5ac07513
Sam Shleifer authored Oct 22, 2020

5ac07513
Reload checkpoint (#7984) · 5ae935d2
Sylvain Gugger authored Oct 22, 2020
```
* Fix checkpoint loading in Trainer

* Fix typo
```
5ae935d2
Fix documentation redirect · 467573dd
Lysandre authored Oct 22, 2020

467573dd

add zero shot pipeline tags & examples (#7983) · 077c99bb

Joe Davison authored Oct 22, 2020



* add zero shot pipeline tags

* rm default and fix yaml format

* rm DS_Store

* add bart large default

* don't add more typos
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* add multiple multilingual examples

* improve multilingual examples for single-label
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

077c99bb

Only log total_flos at the end of training (#7981) · 06fc3954
Sylvain Gugger authored Oct 22, 2020
```
* Only log total_flos at the end of training

* Fix test
```
06fc3954

FillMaskPipeline: support passing top_k on __call__ (#7971) · ff65beaf

Julien Chaumond authored Oct 22, 2020

* FillMaskPipeline: support passing top_k on __call__

Also move from topk to top_k

* migrate to new param name in tests

* Review from @sgugger

ff65beaf

New run glue script (#7917) · 2e5052d4

Sylvain Gugger authored Oct 22, 2020



* Start simplification

* More progress

* Finished script

* Address comments and update tests instructions

* Wrong test

* Accept files as inputs and fix test

* Update src/transformers/trainer_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Fix labels and add combined score

* Add special labels

* Update TPU command

* Revert to old label strategy

* Use model labels

* Fix for STT-B

* Styling

* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Code styling

* Fix review comments
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

2e5052d4

Fixing the "translation", "translation_XX_to_YY" pipelines. (#7975) · 18ce6b8f

Nicolas Patry authored Oct 22, 2020



* Actually make the "translation", "translation_XX_to_YY" task behave correctly.

Background:
- Currently "translation_cn_to_ar" does not work. (only 3 pairs are
supported)
- Some models, contain in their config the correct values for the (src,
tgt) pair they can translate. It's usually just one pair, and we can
infer it automatically from the `model.config.task_specific_params`. If
it's not defined we can still probably load the TranslationPipeline
nevertheless.

Proposed fix:
- A simplified version of what could become more general which is
a `parametrized` task. "translation" + (src, tgt) in this instance
it what we need in the general case. The way we go about it for now
is simply parsing "translation_XX_to_YY". If cases of parametrized task arise
we should preferably go in something closer to what `datasets` propose
which is having a secondary argument `task_options`? that will be close
to what that task requires.
- Should be backward compatible in all cases for instance
`pipeline(task="translation_en_to_de") should work out of the box.
- Should provide a warning when a specific translation pair has been
selected on behalf of the user using
`model.config.task_specific_params`.

* Update src/transformers/pipelines.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

18ce6b8f

Remove the else branch adding 0 to the hidden state if token_type_embeds is None. (#7977) · 901e9b8e
Funtowicz Morgan authored Oct 22, 2020
```
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
```
901e9b8e

[PretrainedConfig] Fix save pretrained config for edge case (#7943) · f34372a9

Patrick von Platen authored Oct 22, 2020



* fix config save

* add test

* add config class variable and another test

* line break

* fix fsmt and typo

* god am I making many errors today :-/

* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f34372a9

adding text classification with DistilBERT/tf notebook (#7964) · cc2e312c

Peter Bayerle authored Oct 22, 2020



Looking at the current community notebooks, it seems that few are targeted for absolute beginners and even fewer are written with TensorFlow. This notebook describes absolutely everything a beginner would need to know, including how to save/load their model and use it for new predictions (this is often omitted in tutorials)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

cc2e312c

# Add whole word mask support for lm fine-tune (#7925) · a16e568f

wlhgtc authored Oct 22, 2020



* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

a16e568f

[fsmt test] basic config test with online model + super tiny model (#7860) · 64b4d25c
Stas Bekman authored Oct 22, 2020
```
* basic config test with online model

* typo

* style

* better test
```
64b4d25c
Disable inference API for t5-11b (#7978) · 3479787e
Julien Chaumond authored Oct 22, 2020

3479787e
[model_card] t5-11b move disclaimer to top of page · a7db81c3
Julien Chaumond authored Oct 22, 2020
```
cc @Narsil @patrickvonplaten
```
a7db81c3
support relative path for best_model_checkpoint (#7973) · f774b2e8
Haebin Shin authored Oct 22, 2020

f774b2e8

[testing] slow tests should be marked as slow (#7895) · 83481056

Stas Bekman authored Oct 22, 2020



* slow tests should be slow

* exception note

* style

* integrate LysandreJik's notes with some expansions

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* another slow test

* fix link, and prose

* clarify.

* note from Sam

* typo
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

83481056

Herbert tokenizer auto load (#7968) · 95792a94
rmroczkowski authored Oct 22, 2020

95792a94

added qg evaluation notebook (#7958) · 4abb7ffc

zolekode authored Oct 22, 2020



* added qg evaluation notebook

* Update notebooks/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

4abb7ffc

21 Oct, 2020 16 commits
- [seq2seq testing] multigpu test run via subprocess (#7281) · 8b381733
  Stas Bekman authored Oct 21, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  8b381733
- [model_cards] camembert: dataset = oscar · f8d3695e
  Julien Chaumond authored Oct 21, 2020
```
Hat/tip @pjox
```
  f8d3695e
- fix 'encode_plus' docstring for 'special_tokens_mask' (0s and 1s were reversed) (#7949) · 16da8771
  Evan Pete Walsh authored Oct 21, 2020
```
* fix docstring for 'special_tokens_mask'

* revert auto formatter changes

* revert another auto format

* revert another auto format
```
  16da8771
- fix test (#7947) · 52decab3
  Patrick von Platen authored Oct 21, 2020
  
  52decab3
- [ProphetNet] Correct Doc string example (#7944) · 9b6610f7
  Patrick von Platen authored Oct 21, 2020
```
* correct xlm prophetnet auto model and examples

* fix line-break docs
```
  9b6610f7
- TensorBoard/Wandb/optuna/raytune integration improvements. (#7935) · e174bfeb
  François Lagunas authored Oct 21, 2020
```
Improved TensorBoard and Wandb integration, as well as optuna and ray/tune support, with minor modifications to trainer core code.
```
  e174bfeb
- Add AI-SOCO models (#7867) · bf162ce8
  Ali Hamdi Ali Fadel authored Oct 21, 2020
  
  bf162ce8
- Create README.md (#7857) · 58fb25f2
  Fangyu Liu authored Oct 21, 2020
```
* Create README.md

model card for cambridgeltl/BioRedditBERT-uncased.

* Update model_cards/cambridgeltl/BioRedditBERT-uncased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  58fb25f2
- Model card for German BERT fine-tuned for LER/NER (#7855) · 2b07ec78
  Manuel Romero authored Oct 21, 2020
  
  2b07ec78
- Create README.md (#7819) · 35d2ad5b
  MichalPleban authored Oct 21, 2020
  
  35d2ad5b
- Create README.md (#7625) · bdda4f22
  Wuwei Lan authored Oct 21, 2020
```
* Create README.md

* Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md

* Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  bdda4f22
- Add missing comma (#7870) · 8e237496
  Manuel Romero authored Oct 21, 2020
  
  8e237496
- Create README.md (#7899) · 3eaa007d
  Manuel Romero authored Oct 21, 2020
  
  3eaa007d
- [model_cards] move hatmimoha/arabic-ner to correct location · 758572ca
  Julien Chaumond authored Oct 21, 2020
```
see https://github.com/huggingface/transformers/commit/16d3cc187ded95946231956460e9004a236474e2 and https://github.com/huggingface/transformers/pull/7836
```
  758572ca
- [multiple models] skip saving/loading deterministic state_dict keys (#7878) · 57516c0c
  Stas Bekman authored Oct 21, 2020
```
* make the save_load special key tests common

* handle mbart

* cleaner solution

* fix

* move test_save_load_missing_keys back into fstm for now

* restore

* style

* add marian

* add pegasus

* blenderbot

* revert - no static embed
```
  57516c0c
- update model cards of Illuin models (#7930) · 006a1648
  quentinheinrich authored Oct 21, 2020
  
  006a1648