Commits · ba8c4d0ac04acfcdbdeaed954f698d6d5ec3e532 · chenpangpang / transformers

18 Oct, 2020 1 commit

[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a

Thomas Wolf authored Oct 18, 2020

* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉



* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ba8c4d0a

16 Oct, 2020 1 commit
- [cleanup] assign todos, faster bart-cnn test (#7835) · 96e47d92
  Sam Shleifer authored Oct 16, 2020
```
* 2 beam output

* unassign/remove TODOs

* remove one more
```
  96e47d92
08 Oct, 2020 1 commit
- Fix 3 failing slow bart/blender tests (#7652) · e3e65173
  Sam Shleifer authored Oct 07, 2020
  
  e3e65173
07 Oct, 2020 1 commit

Blenderbot (#7418) · 960faaaf

Sam Shleifer authored Oct 07, 2020


Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

960faaaf

01 Oct, 2020 1 commit

[Seq2Seq] Fix a couple of bugs and clean examples (#7474) · 62f5ae68

Patrick von Platen authored Oct 01, 2020



* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

62f5ae68

28 Aug, 2020 1 commit

prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a

Sam Shleifer authored Aug 28, 2020

* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc

9336086a

26 Aug, 2020 1 commit
- Black 20 release · a75c64d8
  Lysandre authored Aug 26, 2020
  
  a75c64d8
24 Aug, 2020 1 commit
- Update repo to isort v5 (#6686) · a5737779
  Sylvain Gugger authored Aug 24, 2020
```
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
```
  a5737779
19 Aug, 2020 2 commits
- [BartTokenizerFast] add prepare_seq2seq_batch (#6543) · 7581884d
  Suraj Patil authored Aug 19, 2020
  
  7581884d
- Fix bart base test (#6587) · ab42d748
  Sam Shleifer authored Aug 18, 2020
  
  ab42d748
18 Aug, 2020 1 commit
- add BartConfig.force_bos_token_to_be_generated (#6526) · 1529bf96
  Sam Shleifer authored Aug 18, 2020
```
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  1529bf96
17 Aug, 2020 1 commit
- [BartTokenizer] add prepare s2s batch (#6212) · 2a77813d
  Suraj Patil authored Aug 17, 2020
```
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
```
  2a77813d
11 Aug, 2020 1 commit
- PegasusForConditionalGeneration (torch version) (#6340) · 66fa8cea
  Sam Shleifer authored Aug 11, 2020
```
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
```
  66fa8cea
31 Jul, 2020 1 commit
- Model output test (#6155) · d951c14a
  Sylvain Gugger authored Jul 31, 2020
```
* Use return_dict=True in all tests

* Formatting
```
  d951c14a
07 Jul, 2020 2 commits
- Add mbart-large-cc25, support translation finetuning (#5129) · 353b8f1e
  Sam Shleifer authored Jul 07, 2020
```
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
```
  353b8f1e
- [Bart] enable test_torchscript, update test_tie_weights (#5457) · d4886173
  Sam Shleifer authored Jul 07, 2020
```
* Passing all but one torchscript test

* Style

* move comment

* remove unneeded assert
```
  d4886173
01 Jul, 2020 1 commit
- Move tests/utils.py -> transformers/testing_utils.py (#5350) · 13deb95a
  Sam Shleifer authored Jul 01, 2020
  
  13deb95a
28 Jun, 2020 1 commit
- [mBART] skip broken forward pass test, stronger integration test (#5327) · 28a690a8
  Sam Shleifer authored Jun 28, 2020
  
  28a690a8
26 Jun, 2020 2 commits
- examples/seq2seq/run_eval.py fixes and docs (#5322) · 393b8dc0
  Sam Shleifer authored Jun 26, 2020
  
  393b8dc0
- [tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308) · 601d4d69
  Thomas Wolf authored Jun 26, 2020
```
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
```
  601d4d69
24 Jun, 2020 1 commit
- [Use cache] Align logic of `use_cache` with output_attentions and output_hidden_states (#5194) · c2a26ec8
  Patrick von Platen authored Jun 24, 2020
```
* fix use cache

* add bart use cache

* fix bart

* finish bart
```
  c2a26ec8
19 Jun, 2020 1 commit
- AutoTokenizer supports mbart-large-en-ro (#5121) · 84be482f
  Sam Shleifer authored Jun 18, 2020
  
  84be482f
15 Jun, 2020 2 commits
- [Bart] Question Answering Model is added to tests (#5024) · ebba39e4
  Patrick von Platen authored Jun 15, 2020
```
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
```
  ebba39e4
- Add bart-base (#5014) · a9f1fc6c
  Sam Shleifer authored Jun 15, 2020
  
  a9f1fc6c
12 Jun, 2020 2 commits
- BartForQuestionAnswering (#4908) · e93ccb32
  Suraj Patil authored Jun 13, 2020
  
  e93ccb32
- [mbart] Fix fp16 testing logic (#4949) · 56200331
  Sam Shleifer authored Jun 11, 2020
  
  56200331
11 Jun, 2020 1 commit
- MBartTokenizer:add language codes (#3776) · 08b59d10
  Sam Shleifer authored Jun 11, 2020
  
  08b59d10
05 Jun, 2020 1 commit
- Use labels to remove deprecation warnings (#4807) · f1fe1846
  Sylvain Gugger authored Jun 05, 2020
  
  f1fe1846
02 Jun, 2020 2 commits

Fix CI after killing archive maps (#4724) · b42586ea
Julien Chaumond authored Jun 02, 2020
```
* 🐛 Fix model ids for BART and Flaubert
```
b42586ea

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

25 May, 2020 1 commit
- [ci] fix 3 remaining slow GPU failures (#4584) · b86e42e0
  Sam Shleifer authored May 25, 2020
  
  b86e42e0
19 May, 2020 1 commit
- [gpu slow tests] fix mbart-large-enro gpu tests (#4472) · 956c4c4e
  Sam Shleifer authored May 19, 2020
  
  956c4c4e
12 May, 2020 1 commit
- Fix BART tests on GPU (#4298) · 4bf50422
  Julien Chaumond authored May 12, 2020
  
  4bf50422
01 May, 2020 2 commits

[testing] add timeout_decorator (#3543) · 18db92dd
Sam Shleifer authored May 01, 2020

18db92dd

[ci] Load pretrained models into the default (long-lived) cache · f54dc3f4

Julien Chaumond authored Apr 23, 2020

There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models

When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.

I'd rather always use the default cache

f54dc3f4

28 Apr, 2020 1 commit
- MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') (#3908) · 847e7f33
  Sam Shleifer authored Apr 28, 2020
```
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
```
  847e7f33
10 Apr, 2020 1 commit
- Multilingual BART - (#3602) · 7a7fdf71
  Sam Shleifer authored Apr 10, 2020
```
- support mbart-en-ro weights
- add MBartTokenizer
```
  7a7fdf71
07 Apr, 2020 1 commit
- [Bart] Replace config.output_past with use_cache kwarg (#3632) · 715aa5b1
  Sam Shleifer authored Apr 07, 2020
  
  715aa5b1
30 Mar, 2020 2 commits

[bart-tiny-random] Put a 5MB model on S3 to allow faster exampl… (#3488) · 8deff3ac
Sam Shleifer authored Mar 30, 2020

8deff3ac

[T5] make decoder input ids optional for t5 training (#3521) · 75ec6c9e

Patrick von Platen authored Mar 30, 2020

* make decoder input ids optional for t5 training

* lm_lables should not be shifted in t5

* add tests

* finish shift right functionality for PT T5

* move shift right to correct class

* cleaner code

* replace -100 values with pad token id

* add assert statement

* remove unnecessary for loop

* make style

75ec6c9e