Commits · 0d0a0785fda7ce4808e81f6b3c27c29a51a0b075 · chenpangpang / transformers

15 Nov, 2020 1 commit

[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests... · f4e04cd2

Thomas Wolf authored Nov 15, 2020


[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)

* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

f4e04cd2

12 Nov, 2020 1 commit
- Try to understand and apply Sylvain's comments (#8458) · 27b3ff31
  Julien Plu authored Nov 12, 2020
  
  27b3ff31
23 Oct, 2020 1 commit

Handling longformer model_type (#7990) · d39da5a2

Ethan Perez authored Oct 23, 2020

Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.

This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py

d39da5a2

15 Sep, 2020 1 commit
- [logging] remove no longer needed verbosity override (#7100) · b0cbcdb0
  Stas Bekman authored Sep 15, 2020
  
  b0cbcdb0
27 Aug, 2020 1 commit
- Fix it to work with BART (#6756) · c225e872
  Tom Grek authored Aug 27, 2020
  
  c225e872
30 Jul, 2020 1 commit

Switch from return_tuple to return_dict (#6138) · 91cb9546

Sylvain Gugger authored Jul 30, 2020



* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>

91cb9546

20 Jul, 2020 2 commits

DataParallel fix: multi gpu evaluation (#5926) · 8e0bcb56

Qingqing Cao authored Jul 20, 2020

The DataParallel training was fixed in https://github.com/huggingface/transformers/pull/5733, this commit also fixes the evaluation. It's more convenient when the user enables both `do_train` and `do_eval`.

8e0bcb56

DataParallel fixes (#5733) · 35cb101e

Stas Bekman authored Jul 20, 2020

* DataParallel fixes:

1. switched to a more precise check
-        if self.args.n_gpu > 1:
+        if isinstance(model, nn.DataParallel):

2. fix tests - require the same fixup under DataParallel as the training module

* another fix

35cb101e

28 Jun, 2020 1 commit
- save_pretrained: mkdir(exist_ok=True) (#5258) · 45e26125
  Sam Shleifer authored Jun 28, 2020
```
* all save_pretrained methods mkdir if not os.path.exists
```
  45e26125
17 Jun, 2020 1 commit
- Remove misleading comment · efeb75b8
  Lysandre authored Jun 17, 2020
```
closes #4958
```
  efeb75b8
02 Jun, 2020 1 commit

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

07 May, 2020 1 commit

BIG Reorganize examples (#4213) · 0ae96ff8

Julien Chaumond authored May 07, 2020

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around

0ae96ff8

20 Apr, 2020 1 commit
- Add `qas_id` to SquadResult and SquadExample (#3745) · c79b550d
  Jared T Nielsen authored Apr 20, 2020
```
* Add qas_id

* Fix incorrect name in squad.py

* Make output files optional for squad eval
```
  c79b550d
24 Mar, 2020 1 commit
- [examples] Use AutoModels in more examples · a8e3336a
  Julien Chaumond authored Mar 23, 2020
  
  a8e3336a
02 Mar, 2020 1 commit
- fix n_gpu count when no_cuda flag is activated (#3077) · 6b1ff250
  Victor SANH authored Mar 02, 2020
```
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
```
  6b1ff250
21 Feb, 2020 1 commit
- Added CamembertForQuestionAnswering (#2746) · c749a543
  maximeilluin authored Feb 21, 2020
```
* Added CamembertForQuestionAnswering

* fixed camembert tokenizer case
```
  c749a543
04 Feb, 2020 1 commit

pass langs parameter to certain XLM models (#2734) · d1ab1fab

Yuval Pinter authored Feb 04, 2020

* pass langs parameter to certain XLM models

Adding an argument that specifies the language the SQuAD dataset is in so language-sensitive XLMs (e.g. `xlm-mlm-tlm-xnli15-1024`) don't default to language `0`.
Allows resolution of issue #1799 .

* fixing from `make style`

* fixing style (again)

d1ab1fab

28 Jan, 2020 1 commit
- Default save steps 50 to 500 in all scripts · 335dd5e6
  Lysandre authored Jan 28, 2020
  
  335dd5e6
17 Jan, 2020 1 commit
- Fix typo in examples/run_squad.py · 6d5049a2
  jiyeon_baek authored Jan 17, 2020
```
Rul -> Run
```
  6d5049a2
16 Jan, 2020 1 commit
- Run SQuAD warning when the doc stride may be too high · 6e2c28a1
  Lysandre authored Jan 16, 2020
  
  6e2c28a1
08 Jan, 2020 2 commits
- DistilBERT token type ids removed from inputs in run_squad · 16ce15ed
  Lysandre authored Jan 08, 2020
  
  16ce15ed
- Fix error with global step in run_squad.py · f24232cd
  Lysandre Debut authored Jan 08, 2020
  
  f24232cd
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 5 commits
- Update comments mentioning Python 2. · d6eaf4e6
  Aymeric Augustin authored Dec 22, 2019
  
  d6eaf4e6
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Fix F401 flake8 warning (x88 / 116). · 783a6169
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  783a6169
- Fix E722 flake8 warnings (x26). · 631be270
  Aymeric Augustin authored Dec 21, 2019
  
  631be270
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 3 commits

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

fix merge · b03872aa
thomwolf authored Dec 21, 2019

b03872aa
fix merge · 8a2be93b
thomwolf authored Dec 21, 2019

8a2be93b

19 Dec, 2019 1 commit
- Removed duplicate XLMConfig, XLMForQuestionAnswering and XLMTokenizer from... · 62c1fc3c
  Francesco authored Dec 19, 2019
```
Removed duplicate XLMConfig, XLMForQuestionAnswering and XLMTokenizer from import statement of run_squad.py script
```
  62c1fc3c
16 Dec, 2019 1 commit
- Fix run squad evaluate during training · d8034092
  Lysandre authored Dec 16, 2019
  
  d8034092
14 Dec, 2019 1 commit
- add multiple processing · 8e9526b4
  erenup authored Dec 14, 2019
  
  8e9526b4
13 Dec, 2019 2 commits
- [SQUAD] Load checkpoint when evaluating without training · c8ed1c82
  Lysandre authored Dec 13, 2019
  
  c8ed1c82
- initial version for roberta squad · 9b312f9d
  erenup authored Dec 13, 2019
  
  9b312f9d
12 Dec, 2019 1 commit
- Cleanup squad and add allow train_file and predict_file usage · 7296f101
  LysandreJik authored Dec 12, 2019
  
  7296f101
11 Dec, 2019 1 commit
- Update run_squad to save optimizer and scheduler states, then resume training from a checkpoint · fdc05cd6
  Bilal Khan authored Dec 09, 2019
  
  fdc05cd6
10 Dec, 2019 1 commit
- Complete warning + cleanup · 6a733827
  LysandreJik authored Dec 10, 2019
  
  6a733827