Commits · 79ab881eb18c5d654b190a6d748e3fd2520266b2 · chenpangpang / transformers

05 Jun, 2020 3 commits
- No silent error when d_head already in the configuration (#4747) · 79ab881e
  Lysandre Debut authored Jun 05, 2020
```
* No silent error when d_head already in the configuration

* Update src/transformers/configuration_xlnet.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  79ab881e
- [doc] Make it clearer that `text-generation` does not involve training · b9109f2d
  Julien Chaumond authored Jun 05, 2020
  
  b9109f2d
- Add .vs to gitignore (#4774) · ceaab8dd
  Sylvain Gugger authored Jun 05, 2020
  
  ceaab8dd
04 Jun, 2020 14 commits

Tensorflow improvements (#4530) · f9414f75

Julien Plu authored Jun 05, 2020



* Better None gradients handling

* Apply Style

* Apply Style

* Create a loss class per task to compute its respective loss

* Add loss classes to the ALBERT TF models

* Add loss classes to the BERT TF models

* Add question answering and multiple choice to TF Camembert

* Remove prints

* Add multiple choice model to TF DistilBERT + loss computation

* Add question answering model to TF Electra + loss computation

* Add token classification, question answering and multiple choice models to TF Flaubert

* Add multiple choice model to TF Roberta + loss computation

* Add multiple choice model to TF XLM + loss computation

* Add multiple choice and question answering models to TF XLM-Roberta

* Add multiple choice model to TF XLNet + loss computation

* Remove unused parameters

* Add task loss classes

* Reorder TF imports + add new model classes

* Add new model classes

* Bugfix in TF T5 model

* Bugfix for TF T5 tests

* Bugfix in TF T5 model

* Fix TF T5 model tests

* Fix T5 tests + some renaming

* Fix inheritance issue in the AutoX tests

* Add tests for TF Flaubert and TF XLM Roberta

* Add tests for TF Flaubert and TF XLM Roberta

* Remove unused piece of code in the TF trainer

* bugfix and remove unused code

* Bugfix for TF 2.2

* Apply Style

* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name

* Apply style

* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling

* Fix TF optimizations tests and apply style

* Remove useless parameter

* Bugfix and apply style

* Fix TF Trainer prediction

* Now the TF models return the loss such as their PyTorch couterparts

* Apply Style

* Ignore some tests output

* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.

* Fix names for SQuAD data

* Apply Style

* Fix conflicts with 2.11 release

* Fix conflicts with 2.11

* Fix wrongname

* Add better documentation on the new create_optimizer function

* Fix isort

* logging_dir: use same default as PyTorch
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

f9414f75

Create model card for tblard/allocine (#4775) · ccd26c28
Théophile Blard authored Jun 05, 2020
```
https://huggingface.co/tblard/tf-allocine
```
ccd26c28

NER: Add new WNUT’17 example (#4681) · 2a4b9e09

Stefan Schweter authored Jun 05, 2020

* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort

2a4b9e09

Add drop_last arg for data loader · 0e1869cc
Setu Shah authored Jun 03, 2020

0e1869cc
removed deprecared use of Variable api from pplm example · 48a05026
prajjwal1 authored May 28, 2020

48a05026
Don't access pad_token_id if there is no pad_token (#4773) · 12d0eb5f
Sylvain Gugger authored Jun 04, 2020

12d0eb5f
Create model card for T5-base fine-tuned for Sentiment Span Extraction (#4737) · 17a88d31
Manuel Romero authored Jun 04, 2020

17a88d31
Create README.md (#4743) · fb52143c
Oren Amsalem authored Jun 04, 2020

fb52143c

Model Card for RoBERTa trained on Sanskrit (#4763) · 5f077a34

Suraj Parmar authored Jun 04, 2020

* Model cad for SanBERTa

Model Card for RoBERTa trained on Sanskrit

* Model card for SanBERTa

model card for RoBERTa trained on Sanskrit

5f077a34

Add note about doc generation (#4770) · cd4e07a8
Sylvain Gugger authored Jun 04, 2020

cd4e07a8
Remove unnecessary model_type arg in example (#4771) · 492b352a
Jason Phang authored Jun 04, 2020

492b352a
Codecov setup (#4768) · e645b9ab
Lysandre Debut authored Jun 04, 2020
```
* Codecov setup

* Understanding codecov
```
e645b9ab
[cleanup] PretrainedModel.generate: remove unused kwargs (#4761) · 2b8b6c92
Sam Shleifer authored Jun 04, 2020

2b8b6c92

Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585) · 5bf9afbf

Funtowicz Morgan authored Jun 04, 2020

* Refactor tensor creation in tokenizers.

* Make sure to convert string to TensorType

* Refactor convert_to_tensors_

* Introduce numpy tensor creation

* Format

* Add unittest for TensorType creation from str

* sorting imports

* Added unittests for numpy tensor conversion.

* Do not use in-place version for squeeze as numpy doesn't provide such feature.

* Added extra parameter prepend_batch_axis: bool on prepare_for_model.

* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.

* style.

* numpy tests require_torch for now while flax not merged.

* Hopefully will make flake8 happy.

* One more time 🎶

5bf9afbf

03 Jun, 2020 7 commits

never_split on slow tokenizers should not split (#4723) · efae1549

Funtowicz Morgan authored Jun 03, 2020

* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.

* never_split only use membership attempt to use a set() which is 10x faster for this operation.

* Use union to concatenate two sets.

* Updated docstring for never_split parameter.

* Avoid set.union() if never_split is None

* Added comments.

* Correct docstring format.

efae1549

Update encode documentation (#4751) · 2e4de762
Lysandre Debut authored Jun 03, 2020

2e4de762
fix beam search bug in tf as well (#4745) · ed4df855
Patrick von Platen authored Jun 03, 2020

ed4df855

Unify label args (#4722) · 1b5820a5

Sylvain Gugger authored Jun 03, 2020

* Deprecate masked_lm_labels argument

* Apply to all models

* Better error message

1b5820a5

Adding notebooks for Fine Tuning [Community Notebook] (#4732) · 3e5928c5

Abhishek Kumar Mishra authored Jun 03, 2020

* Added links to more community notebooks

Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials


Different Transformers models are fine tuned on Dataset using PyTorch

* Update README.md

* Update README.md

* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

3e5928c5

Pipelines: miscellanea of QoL improvements and small features... (#4632) · 99207bd1

Julien Chaumond authored Jun 03, 2020

* [hf_api] Attach all unknown attributes for future-proof compatibility

* [Pipeline] NerPipeline is really a TokenClassificationPipeline

* modelcard.py: I don't think we need to force the download

* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer

* FillMaskPipeline: also output token in string form

* TextClassificationPipeline: option to return all scores, not just the argmax

* Update docs/source/main_classes/pipelines.rst

99207bd1

bert-small-cord19 model cards (#4730) · 8ed47aa1
David Mezzetti authored Jun 03, 2020
```
* Create README.md

* Create README.md

* Create README.md
```
8ed47aa1

02 Jun, 2020 11 commits
- [Reformer] Improved memory if input is shorter than chunk length (#4720) · 9ca48573
  Patrick von Platen authored Jun 02, 2020
```
* improve handling of short inputs for reformer

* correct typo in assert statement

* fix other tests
```
  9ca48573
- Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621) · b231a413
  Jin Young Sohn authored Jun 02, 2020
```
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  b231a413
- TFRobertaModelIntegrationTest requires tf (#4726) · 70f74234
  Sam Shleifer authored Jun 02, 2020
  
  70f74234
- Repin versions · d976ef26
  Lysandre authored Jun 02, 2020
  
  d976ef26
- Fix CI after killing archive maps (#4724) · b42586ea
  Julien Chaumond authored Jun 02, 2020
```
* 🐛 Fix model ids for BART and Flaubert
```
  b42586ea
- Release: v2.11.0 · b43c78e5
  Lysandre authored Jun 02, 2020
  
  b43c78e5
- Kill model archive maps (#4636) · d4c2cb40
  Julien Chaumond authored Jun 02, 2020
```
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
```
  d4c2cb40
- [pipeline] Tokenizer should not add special tokens for text generation (#4686) · 47a551d1
  Patrick von Platen authored Jun 02, 2020
```
* allow to not add special tokens

* remove print
```
  47a551d1
- Override get_vocab for fast tokenizer. (#4717) · f6d5046a
  Funtowicz Morgan authored Jun 02, 2020
  
  f6d5046a
- Specify PyTorch versions for examples (#4710) · 88762a2f
  Lysandre Debut authored Jun 02, 2020
  
  88762a2f
- Add community notebook for sentiment span extraction (#4700) · d3ef14f9
  Lorenzo Ampil authored Jun 02, 2020
  
  d3ef14f9
01 Jun, 2020 5 commits
- Make docstring match args (#4711) · 76779363
  Sylvain Gugger authored Jun 01, 2020
  
  76779363
- close #4685 · 6449c494
  Lysandre authored Jun 01, 2020
  
  6449c494
- [config] Ensure that id2label always takes precedence over num_labels · ec8717d5
  Julien Chaumond authored Jun 01, 2020
  
  ec8717d5
- [config] Ensure that id2label always takes precedence over num_labels · 751a1e08
  Julien Chaumond authored Jun 01, 2020
```
Fixes bug reported in https://github.com/huggingface/transformers/issues/4669

See #3967 for context
```
  751a1e08
- Fix onnx export input names order (#4641) · ec62b7d9
  Rens authored Jun 01, 2020
```
* pass on tokenizer to pipeline

* order input names when convert to onnx

* update style

* remove unused imports

* make ordered inputs list needs to be mutable

* add test custom bert model

* remove unused imports
```
  ec62b7d9