Commits · 3095ee9dab739f212a8753b5be4e1a72ba42e28e · chenpangpang / transformers

17 Nov, 2020 3 commits

these should run fine on multi-gpu (#8582) · f0435f5a
Stas Bekman authored Nov 17, 2020

f0435f5a

Tokenizers: ability to load from model subfolder (#8586) · 042a6aa7

Julien Chaumond authored Nov 17, 2020



* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

042a6aa7

Reorganize repo (#8580) · c89bdfbe

Sylvain Gugger authored Nov 16, 2020

* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command

c89bdfbe

16 Nov, 2020 1 commit

Switch `return_dict` to `True` by default. (#8530) · 1073a2bd

Sylvain Gugger authored Nov 16, 2020

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests

1073a2bd

15 Nov, 2020 1 commit

[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests... · f4e04cd2

Thomas Wolf authored Nov 15, 2020


[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)

* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

f4e04cd2

12 Nov, 2020 2 commits
- Try to understand and apply Sylvain's comments (#8458) · 27b3ff31
  Julien Plu authored Nov 12, 2020
  
  27b3ff31
- quick fix on concatenating text to support more datasets (#8474) · 924c624a
  zeyuyun1 authored Nov 12, 2020
  
  924c624a
11 Nov, 2020 2 commits

[s2s] distill t5-large -> t5-small (#8376) · 81ebd706
Sumithra Bhakthavatsalam authored Nov 11, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
81ebd706

Example NER script predicts on tokenized dataset (#8468) · a38d1c7c

sarnoult authored Nov 11, 2020

The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`

a38d1c7c

10 Nov, 2020 4 commits
- using multi_gpu consistently (#8446) · 02bdfc02
  Stas Bekman authored Nov 10, 2020
```
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
```
  02bdfc02
- [examples] better PL version check (#8429) · 5d4972e6
  Stas Bekman authored Nov 10, 2020
  
  5d4972e6
- [s2s/distill] hparams.tokenizer_name = hparams.teacher (#8382) · ae1cb4ec
  Shichao Sun authored Nov 10, 2020
  
  ae1cb4ec
- Update links from s3 to huggingface.co · 55e8d0ce
  Julien Chaumond authored Nov 10, 2020
  
  55e8d0ce
09 Nov, 2020 5 commits

[github CI] add a multi-gpu job for all example tests (#8341) · 190df585

Stas Bekman authored Nov 09, 2020



* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

190df585

[Tests] Add Common Test for Training + Fix a couple of bugs (#8415) · 9c83b96e

Patrick von Platen authored Nov 09, 2020

* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert

9c83b96e

Fix typo · 5c766ecb
Sylvain Gugger authored Nov 09, 2020

5c766ecb

Add new token classification example (#8340) · 908a2889

Sylvain Gugger authored Nov 09, 2020



* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

908a2889

examples/docs: caveat that PL examples don't work on TPU (#8309) · ebde57ac
Sam Shleifer authored Nov 09, 2020

ebde57ac

08 Nov, 2020 3 commits
- [s2s/distill] remove run_distiller.sh, fix xsum script (#8412) · e6d9cdaa
  Sam Shleifer authored Nov 08, 2020
  
  e6d9cdaa
- [s2s test_finetune_trainer] failing multigpu test (#8400) · 66582492
  Stas Bekman authored Nov 08, 2020
  
  66582492
- [s2s examples test] fix data path (#8398) · f62755a6
  Stas Bekman authored Nov 08, 2020
  
  f62755a6
06 Nov, 2020 2 commits

Fix typo (#8351) · 5807ba3f
Jonathan Chang authored Nov 06, 2020

5807ba3f

[s2s] test_bash_script.py - actually learn something (#8318) · 9edafaeb

Stas Bekman authored Nov 05, 2020

* use decorator

* remove hardcoded paths

* make the test use more data and do real quality tests

* shave off 10 secs

* add --eval_beams 2, reformat

* reduce train size, use smaller custom dataset

9edafaeb

05 Nov, 2020 5 commits

Docs bart training ref (#8330) · 17450397
Leandro von Werra authored Nov 05, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
17450397
[s2s] test_distributed_eval (#8315) · d787935a
Stas Bekman authored Nov 05, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
d787935a
no warn (#8329) · 7abc1d96
Sam Shleifer authored Nov 05, 2020

7abc1d96

change TokenClassificationTask class methods to static methods (#7902) · 52f44dd6

Bobby Donchev authored Nov 05, 2020



* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

52f44dd6

Corrected typo in readme (#8320) · 77c8f6c6
Guillem García Subies authored Nov 05, 2020

77c8f6c6

04 Nov, 2020 4 commits
- Clean up data collators and datasets (#8308) · 9c4aa4ac
  Sylvain Gugger authored Nov 04, 2020
```
* Clean up data collators and datasets

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
```
  9c4aa4ac
- Fix path to old run_language_modeling.py script (#8302) · b1d3e95e
  Manuel Romero authored Nov 04, 2020
  
  b1d3e95e
- Fix validation file loading in scripts (#8298) · cf897246
  Sylvain Gugger authored Nov 04, 2020
  
  cf897246
- Fix typo in language-modeling README.md (#8287) · 734afa37
  Pengzhi Gao authored Nov 04, 2020
  
  734afa37
03 Nov, 2020 6 commits
- [CIs] Better reports everywhere (#8275) · 1bb4bba5
  Stas Bekman authored Nov 03, 2020
```
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix
```
  1bb4bba5
- make files independent (#8267) · 068e6b5e
  Patrick von Platen authored Nov 03, 2020
  
  068e6b5e
- [examples] minimal version requirement run-time check in PL (#8133) · cd360dcb
  Stas Bekman authored Nov 03, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  cd360dcb
- Fix Tatoeba skip · eb6313e8
  Lysandre authored Nov 03, 2020
  
  eb6313e8
- Skip tatoeba tests if Tatoeba-Challenge not cloned (#8260) · b63beb74
  Sam Shleifer authored Nov 03, 2020
  
  b63beb74
- [Seq2Seq] Correct import in Seq2Seq Trainer (#8254) · 9f1747f9
  Patrick von Platen authored Nov 03, 2020
  
  9f1747f9
02 Nov, 2020 1 commit

Add line by line option to mlm/plm scripts (#8240) · e1b1b614

Sylvain Gugger authored Nov 02, 2020



* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

e1b1b614

01 Nov, 2020 1 commit
- [Seq2SeqTrainer] Move import to init to make file self-contained (#8194) · 9bd30f7c
  Patrick von Platen authored Nov 01, 2020
```
* boom boom

* reverse order
```
  9bd30f7c