Commits · 1dc9b3c7847269961458c059ad8ad443b26bf60d · chenpangpang / transformers

22 Apr, 2020 2 commits

Fixes #3877 · 1dc9b3c7
Julien Chaumond authored Apr 22, 2020

1dc9b3c7

Julien Chaumond authored Apr 21, 2020

* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6e

dd9d483d

21 Apr, 2020 4 commits
- [ci] Pin torch version while we update · eb5601b0
  Julien Chaumond authored Apr 21, 2020
  
  eb5601b0
- create readme for spentaur/yelp model (#3874) · 53f5ef6d
  Spencer Adams authored Apr 21, 2020
```
* create readme for spentaur/yelp model

* update spentaur/yelp/README.md

* remove typo
```
  53f5ef6d
- Fix Torch.hub + Integration test · d32585a3
  Julien Chaumond authored Apr 21, 2020
  
  d32585a3
- Fix Documentation issue in BertForMaskedLM forward (#3855) · 7d40901c
  Bharat Raghunathan authored Apr 21, 2020
  
  7d40901c
20 Apr, 2020 10 commits
- Fix bug in examples: double wrap into DataParallel during eval · b1ff0b2a
  Andrey Kulagin authored Apr 17, 2020
  
  b1ff0b2a
- added electra model · 7f23af16
  husein zolkepli authored Apr 19, 2020
```
(cherry picked from commit b5f2dc5d627d44b8cbb0ccf8ad2b46bea211a236)
```
  7f23af16
- New model added · 03121deb
  Punyajoy Saha authored Apr 20, 2020
```
The first model added to the repo
```
  03121deb
- Create model card · 15b9868f
  Manuel Romero authored Apr 14, 2020
  
  15b9868f
- Remove tqdm logging when using pipelines. (#3833) · 2c05b8a5
  Funtowicz Morgan authored Apr 20, 2020
```
Introduce tqdm_enabled parameter on squad_convert_examples_to_features() default to True and set to False in QA pipelines.
```
  2c05b8a5
- Add `qas_id` to SquadResult and SquadExample (#3745) · c79b550d
  Jared T Nielsen authored Apr 20, 2020
```
* Add qas_id

* Fix incorrect name in squad.py

* Make output files optional for squad eval
```
  c79b550d
- [Pipelines] Encode to max length of input not max length of tokenizer for batch input (#3857) · c4158a63
  Patrick von Platen authored Apr 20, 2020
```
* remove max_length = tokenizer.max_length when encoding

* make style
```
  c4158a63
- exbert links for my albert model cards (#3729) · 857ccdb2
  Mohamed El-Geish authored Apr 20, 2020
```
* exbert links for my albert model cards

* Added exbert tag to the metadata block

* Adding "how to cite"
```
  857ccdb2
- [examples] fix summarization do_predict (#3866) · a504cb49
  Sam Shleifer authored Apr 20, 2020
  
  a504cb49
- Update README.md · 52c85f84
  ahotrod authored Apr 19, 2020
  
  52c85f84
18 Apr, 2020 6 commits

add "by" to ReadMe · a21d4fa4
Patrick von Platen authored Apr 18, 2020

a21d4fa4

Cleanup fast tokenizers integration (#3706) · 827d6d6e

Thomas Wolf authored Apr 18, 2020



* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>

827d6d6e

[model_cards] Fix CamemBERT table markdown · 60a42ef1
Julien Chaumond authored Apr 17, 2020
```
see https://github.com/huggingface/transformers/pull/3836
```
60a42ef1
[ci] GitHub-hosted runner has no space left on device · 88aecee6
Julien Chaumond authored Apr 17, 2020

88aecee6
Update camembert-base-README.md (#3836) · 73efa694
Benjamin Muller authored Apr 18, 2020

73efa694
[Config, Serialization] more readable config serialization (#3797) · e9d0bc02
Patrick von Platen authored Apr 18, 2020
```
* better config serialization

* finish configuration utils
```
e9d0bc02

17 Apr, 2020 8 commits
- XLM tokenizer should encode with bos token (#3791) · 8b63a01d
  Lysandre Debut authored Apr 17, 2020
```
* XLM tokenizer should encode with bos token

* Update tests
```
  8b63a01d
- Higher tolerance for past testing in TF T5 (#3844) · 1d4a35b3
  Patrick von Platen authored Apr 17, 2020
  
  1d4a35b3
- Higher tolerance for past testing in T5 (#3843) · d13eca11
  Patrick von Platen authored Apr 17, 2020
  
  d13eca11
- Add workflow to build docs (#3763) · b0c9fbb2
  Harutaka Kawamura authored Apr 18, 2020
  
  b0c9fbb2
- Add support for the null answer in `QuestionAnsweringPipeline` (#3441) · c19727fd
  Santiago Castro authored Apr 17, 2020
```
* Add support for the null answer in `QuestionAnsweringPipeline`

* black

* Fix min null score computation

* Fix a PR comment
```
  c19727fd
- Fix token_type_id in BERT question-answering example (#3790) · edf0582c
  Simon Böhm authored Apr 17, 2020
```
token_type_id is converted into the segment embedding. For question answering,
this needs to highlight whether a token belongs to sequence 0 or 1.
encode_plus takes care of correctly setting this parameter automatically.
```
  edf0582c
- Question Answering support for Albert and Roberta in TF (#3812) · 6d00033e
  Pierric Cistac authored Apr 17, 2020
```
* Add TFAlbertForQuestionAnswering

* Add TFRobertaForQuestionAnswering

* Update TFAutoModel with Roberta/Albert for QA

* Clean `super` TF Albert calls
```
  6d00033e
- Update README · f399c006
  Patrick von Platen authored Apr 17, 2020
  
  f399c006
16 Apr, 2020 10 commits

[examples] summarization/bart/finetune.py supports t5 (#3824) · f0c96faf
Sam Shleifer authored Apr 16, 2020
```
renames `run_bart_sum.py` to `finetune.py`
```
f0c96faf
typo: fine-grained token-leven · 0cec4fab
Jonathan Sum authored Apr 15, 2020
```
Changing from "fine-grained token-leven" to "fine-grained token-level"
```
0cec4fab
Tanh torch warnings · 14cdeee7
Aryansh Omray authored Apr 16, 2020

14cdeee7
[PretrainedTokenizer] Factor out tensor conversion method (#3777) · 16469fed
Sam Shleifer authored Apr 16, 2020

16469fed

[Examples, T5] Change newstest2013 to newstest2014 and clean up (#3817) · 80a16945

Patrick von Platen authored Apr 16, 2020



* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion

* More pythonic file handling through 'with' context

* COSMETIC - ran Black and isort

* Fixed reference to number of lines in newstest2014

* Fixed failing test. More pythonic file handling

* finish PR from tholiao

* remove outcommented lines

* make style

* make isort happy
Co-authored-by: Thomas Liao <tholiao@gmail.com>

80a16945

JIT not compatible with PyTorch/XLA (#3743) · d4867951
Lysandre Debut authored Apr 16, 2020

d4867951
Typo fix (#3821) · b1e2368b
Davide Fiocco authored Apr 16, 2020

b1e2368b
clean pipelines (#3795) · baca8fa8
Patrick von Platen authored Apr 16, 2020

baca8fa8

[TFT5, Cache] Add cache to TFT5 (#3772) · 38f7461d

Patrick von Platen authored Apr 16, 2020

* correct gpt2 test inputs

* make style

* delete modeling_gpt2 change in test file

* translate from pytorch

* correct tests

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* make tensorflow t5 caching work

* make style

* clean reorder cache

* remove unnecessary spaces

* fix test

38f7461d

change pad token id to config pad token id (#3793) · a5b24947
Patrick von Platen authored Apr 16, 2020

a5b24947