Commits · 20fa82898495f516b221115fc3ef9ec8ebf50b1e · chenpangpang / transformers

17 Jun, 2020 1 commit
- Make default_data_collator more flexible and deprecate old behavior (#5060) · 20fa8289
  Sylvain Gugger authored Jun 17, 2020
```
* Make default_data_collator more flexible

* Accept tensors for all features

* Document code

* Refactor

* Formatting
```
  20fa8289
16 Jun, 2020 3 commits

Fix marian tokenizer save pretrained (#5043) · 3d495c61
Sam Shleifer authored Jun 16, 2020

3d495c61
[cleanup] Hoist ModelTester objects to top level (#4939) · c852036b
Amil Khare authored Jun 16, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
c852036b

Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039) · 9e033649

Funtowicz Morgan authored Jun 16, 2020

* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.

* Added __get_state__() & __set_state__() to be pickable.

* Correct tokens() return type from List[int] to List[str]

* Added unittest for BatchEncoding pickle/unpickle

* Added unittest for BatchEncoding is_fast

* More careful checking on BatchEncoding unpickle tests.

* Formatting.

* is_fast should assertTrue on Rust tokenizers.

* Ensure tensorflow has correct way of checking array_equal

* More formatting.

9e033649

15 Jun, 2020 5 commits

Add DistilBertForMultipleChoice (#5032) · f9f8a531
Sylvain Gugger authored Jun 15, 2020
```
* Add `DistilBertForMultipleChoice`
```
f9f8a531

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

[Bart] Question Answering Model is added to tests (#5024) · ebba39e4
Patrick von Platen authored Jun 15, 2020
```
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
```
ebba39e4
Add bart-base (#5014) · a9f1fc6c
Sam Shleifer authored Jun 15, 2020

a9f1fc6c

Make DataCollator a callable (#5015) · 1affde2f

Sylvain Gugger authored Jun 15, 2020



* Make DataCollator a callable

* Update src/transformers/data/data_collator.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

1affde2f

12 Jun, 2020 4 commits
- BartForQuestionAnswering (#4908) · e93ccb32
  Suraj Patil authored Jun 13, 2020
  
  e93ccb32
- Add AlbertForMultipleChoice (#4959) · 538531cd
  Sylvain Gugger authored Jun 12, 2020
```
* Add AlbertForMultipleChoice

* Make up to date and add all models to common tests
```
  538531cd
- [AutoModel] Split AutoModelWithLMHead into clm, mlm, encoder-decoder (#4933) · 86578bb0
  Patrick von Platen authored Jun 12, 2020
```
* first commit

* add new auto models

* better naming

* fix bert automodel

* fix automodel for pretraining

* add models to init

* fix name typo

* fix typo

* better naming

* future warning instead of depreciation warning
```
  86578bb0
- [mbart] Fix fp16 testing logic (#4949) · 56200331
  Sam Shleifer authored Jun 11, 2020
  
  56200331
11 Jun, 2020 2 commits
- MBartTokenizer:add language codes (#3776) · 08b59d10
  Sam Shleifer authored Jun 11, 2020
  
  08b59d10
- Support multiple choice in tf common model tests (#4920) · 20451195
  Sylvain Gugger authored Jun 11, 2020
```
* Support multiple choice in tf common model tests

* Add the input_embeds test
```
  20451195
10 Jun, 2020 8 commits

Fix resize_token_embeddings for Transformer-XL (#4759) · e80d6c68

RafaelWO authored Jun 11, 2020



* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>

e80d6c68

Make multiple choice models work with input_embeds (#4921) · d541938c
Sylvain Gugger authored Jun 10, 2020

d541938c

Split LMBert model in two (#4874) · 1e2631d6

Sylvain Gugger authored Jun 10, 2020

* Split LMBert model in two

* Fix example

* Remove lm_labels

* Adapt tests, refactor prepare_for_generation

* Fix merge

* Hide BeartLMHeadModel

1e2631d6

ElectraForQuestionAnswering (#4913) · ef2dcdcc

Suraj Patil authored Jun 11, 2020

* ElectraForQuestionAnswering

* udate __init__

* add test for electra qa model

* add ElectraForQuestionAnswering in auto models

* add ElectraForQuestionAnswering in all_model_classes

* fix outputs, input_ids defaults to None

* add ElectraForQuestionAnswering in docs

* remove commented line

ef2dcdcc

[ctrl] fix pruning of MultiHeadAttention (#4904) · 5d63ca6c
Amil Khare authored Jun 10, 2020

5d63ca6c
Add more models to common tests (#4910) · 4e10acb3
Sylvain Gugger authored Jun 10, 2020

4e10acb3
Fix the CI (#4903) · ac99217e
Sylvain Gugger authored Jun 10, 2020
```
* Fix CI
```
ac99217e
Deal with multiple choice in common tests (#4886) · 0a375f5a
Sylvain Gugger authored Jun 10, 2020
```
* Deal with multiple choice in common tests
```
0a375f5a

09 Jun, 2020 2 commits

[All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7

Bharat Raghunathan authored Jun 10, 2020



* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

6e603cb7

[Benchmark] add tpu and torchscipt for benchmark (#4850) · 2cfb947f

Patrick von Platen authored Jun 09, 2020



* add tpu and torchscipt for benchmark

* fix name in tests

* "fix email"

* make style

* better log message for tpu

* add more print and info for tpu

* allow possibility to print tpu metrics

* correct cpu usage

* fix test for non-install

* remove bugus file

* include psutil in testing

* run a couple of times before tracing in torchscript

* do not allow tpu memory tracing for now

* make style

* add torchscript to env

* better name for torch tpu
Co-authored-by: Patrick von Platen <patrick@huggingface.co>

2cfb947f

08 Jun, 2020 1 commit
- fix PR (#4810) · c0554776
  Patrick von Platen authored Jun 08, 2020
  
  c0554776
06 Jun, 2020 1 commit
- [marian tests ] pass device to pipeline (#4815) · c58e6c12
  Sam Shleifer authored Jun 06, 2020
  
  c58e6c12
05 Jun, 2020 4 commits
- [cleanup/marian] pipelines test and new kwarg (#4812) · 4ab74245
  Sam Shleifer authored Jun 05, 2020
  
  4ab74245
- [EncoderDecoderConfig] automatically set decoder config to decoder (#4809) · 8cca8755
  Patrick von Platen authored Jun 05, 2020
```
* automatically set decoder config to decoder

* add more tests
```
  8cca8755
- Use labels to remove deprecation warnings (#4807) · f1fe1846
  Sylvain Gugger authored Jun 05, 2020
  
  f1fe1846
- Fix argument label (#4792) · 4dd5cf22
  Sylvain Gugger authored Jun 05, 2020
```
* Fix argument label

* Fix test
```
  4dd5cf22
04 Jun, 2020 2 commits

Tensorflow improvements (#4530) · f9414f75

Julien Plu authored Jun 05, 2020



* Better None gradients handling

* Apply Style

* Apply Style

* Create a loss class per task to compute its respective loss

* Add loss classes to the ALBERT TF models

* Add loss classes to the BERT TF models

* Add question answering and multiple choice to TF Camembert

* Remove prints

* Add multiple choice model to TF DistilBERT + loss computation

* Add question answering model to TF Electra + loss computation

* Add token classification, question answering and multiple choice models to TF Flaubert

* Add multiple choice model to TF Roberta + loss computation

* Add multiple choice model to TF XLM + loss computation

* Add multiple choice and question answering models to TF XLM-Roberta

* Add multiple choice model to TF XLNet + loss computation

* Remove unused parameters

* Add task loss classes

* Reorder TF imports + add new model classes

* Add new model classes

* Bugfix in TF T5 model

* Bugfix for TF T5 tests

* Bugfix in TF T5 model

* Fix TF T5 model tests

* Fix T5 tests + some renaming

* Fix inheritance issue in the AutoX tests

* Add tests for TF Flaubert and TF XLM Roberta

* Add tests for TF Flaubert and TF XLM Roberta

* Remove unused piece of code in the TF trainer

* bugfix and remove unused code

* Bugfix for TF 2.2

* Apply Style

* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name

* Apply style

* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling

* Fix TF optimizations tests and apply style

* Remove useless parameter

* Bugfix and apply style

* Fix TF Trainer prediction

* Now the TF models return the loss such as their PyTorch couterparts

* Apply Style

* Ignore some tests output

* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.

* Fix names for SQuAD data

* Apply Style

* Fix conflicts with 2.11 release

* Fix conflicts with 2.11

* Fix wrongname

* Add better documentation on the new create_optimizer function

* Fix isort

* logging_dir: use same default as PyTorch
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

f9414f75

Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585) · 5bf9afbf

Funtowicz Morgan authored Jun 04, 2020

* Refactor tensor creation in tokenizers.

* Make sure to convert string to TensorType

* Refactor convert_to_tensors_

* Introduce numpy tensor creation

* Format

* Add unittest for TensorType creation from str

* sorting imports

* Added unittests for numpy tensor conversion.

* Do not use in-place version for squeeze as numpy doesn't provide such feature.

* Added extra parameter prepend_batch_axis: bool on prepare_for_model.

* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.

* style.

* numpy tests require_torch for now while flax not merged.

* Hopefully will make flake8 happy.

* One more time 🎶

5bf9afbf

03 Jun, 2020 1 commit

Unify label args (#4722) · 1b5820a5

Sylvain Gugger authored Jun 03, 2020

* Deprecate masked_lm_labels argument

* Apply to all models

* Better error message

1b5820a5

02 Jun, 2020 4 commits
- [Reformer] Improved memory if input is shorter than chunk length (#4720) · 9ca48573
  Patrick von Platen authored Jun 02, 2020
```
* improve handling of short inputs for reformer

* correct typo in assert statement

* fix other tests
```
  9ca48573
- TFRobertaModelIntegrationTest requires tf (#4726) · 70f74234
  Sam Shleifer authored Jun 02, 2020
  
  70f74234
- Fix CI after killing archive maps (#4724) · b42586ea
  Julien Chaumond authored Jun 02, 2020
```
* 🐛 Fix model ids for BART and Flaubert
```
  b42586ea
- Kill model archive maps (#4636) · d4c2cb40
  Julien Chaumond authored Jun 02, 2020
```
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
```
  d4c2cb40
01 Jun, 2020 1 commit

Fix onnx export input names order (#4641) · ec62b7d9

Rens authored Jun 01, 2020

* pass on tokenizer to pipeline

* order input names when convert to onnx

* update style

* remove unused imports

* make ordered inputs list needs to be mutable

* add test custom bert model

* remove unused imports

ec62b7d9

29 May, 2020 1 commit
- [EncoderDecoder] Fix initialization and save/load bug (#4680) · 0866669e
  Patrick von Platen authored May 30, 2020
```
* fix bug

* add more tests
```
  0866669e