Commits · fd405e9a93f066cf1230ce4d53e2ade73c4a5497 · chenpangpang / transformers

26 Jun, 2020 8 commits

Add BART-base modeling and configuration (#5315) · fd405e9a
Kevin Canwen Xu authored Jun 27, 2020

fd405e9a
[pipelines] Change summarization default to distilbart-cnn-12-6 (#5289) · 798dbff6
Sam Shleifer authored Jun 26, 2020

798dbff6

Add benchmark notebook (#5312) · 834b6884

Patrick von Platen authored Jun 26, 2020

* add notebook

* Créé avec Colaboratory

* move notebook to correct folder

* correct link

* correct filename

* correct filename

* better name

834b6884

[Generation] fix docs for decoder_input_ids (#5306) · 08c9607c

Patrick von Platen authored Jun 26, 2020

* fix docs

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py

08c9607c

[Benchmarks] improve Example Plotter (#5245) · 79a82cc0
Patrick von Platen authored Jun 26, 2020
```
* improve plotting

* better labels

* fix time plot
```
79a82cc0

Gpt2 model card (#5283) · 88d7f96e

Sylvain Gugger authored Jun 26, 2020

* Bert base model card

* Add metadata

* Adapt examples

* GPT2 model card

* Remove the BERT model card

* Change language code

88d7f96e

Bert base model card (#5276) · fc5bce9e

Sylvain Gugger authored Jun 26, 2020



* Bert base model card

* Add metadata

* Adapt examples

* Comment on text generation

* Update model_cards/bert-base-uncased-README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

fc5bce9e

Add pad_to_multiple_of on tokenizers (reimport) (#5054) · 135791e8

Funtowicz Morgan authored Jun 26, 2020



* Add new parameter `pad_to_multiple_of` on tokenizers.

* unittest for pad_to_multiple_of

* Add .name when logging enum.

* Fix missing .items() on dict in tests.

* Add special check + warning if the tokenizer doesn't have proper pad_token.

* Use the correct logger format specifier.

* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.

* Skip test if tokenizer doesn't have pad_token

* Fix RobertaTokenizer on empty input

* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix and updating to simpler API
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

135791e8

25 Jun, 2020 11 commits

Closes #5218 · 7cc15bdd
Lysandre Debut authored Jun 25, 2020

7cc15bdd

Training & fine-tuning quickstart (#5034) · 2ffef0d0

Joe Davison authored Jun 25, 2020



* add initial fine-tuning guide

* split code blocks to smaller segments

* fix up trianer section of fine-tune doc

* a few last typos

* Update usage -> task summary link
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

2ffef0d0

Refactor Code samples; Test code samples (#5036) · 364a5ae1

Lysandre Debut authored Jun 25, 2020



* Refactor code samples

* Test docstrings

* Style

* Tokenization examples

* Run rust of tests

* First step to testing source docs

* Style and BART comment

* Test the remainder of the code samples

* Style

* let to const

* Formatting fixes

* Ready for merge

* Fix fixture + Style

* Fix last tests

* Update docs/source/quicktour.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing @sgugger's comments + Fix MobileBERT in TF
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

364a5ae1

[tokenizers] Several small improvements and bug fixes (#5287) · 315f464b

Thomas Wolf authored Jun 25, 2020

* avoid recursion in id checks for fast tokenizers

* better typings and fix #5232

* align slow and fast tokenizers behaviors for Roberta and GPT2

* style and quality

* fix tests - improve typings

315f464b

Remove links for all docs (#5280) · 24f46ea3
Sylvain Gugger authored Jun 25, 2020

24f46ea3

[Tokenization] Fix #5181 - make #5155 more explicit - move back the default... · 27cf1d97

Thomas Wolf authored Jun 25, 2020

[Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252)

* fix-5181

Padding to max sequence length while truncation to another length was wrong on slow tokenizers

* clean up and fix #5155

* fix XLM test

* Fix tests for Transfo-XL

* logging only above WARNING in tests

* switch slow tokenizers tests in @slow

* fix Marian truncation tokenization test

* style and quality

* make the test a lot faster by limiting the sequence length used in tests

27cf1d97

[examples/seq2seq] more README improvements (#5274) · e008d520
Sam Shleifer authored Jun 25, 2020

e008d520
[model_cards] Example of how to specify inputs for the widget · 6a495cae
Julien Chaumond authored Jun 25, 2020

6a495cae
Fix convert_graph_to_onnx (#5230) · 0e1fce3c
Anthony MOI authored Jun 25, 2020

0e1fce3c
Create README.md (#5259) · 5543efd5
Moumeneb1 authored Jun 25, 2020

5543efd5
examples/seq2seq supports translation (#5202) · 40457bce
Sam Shleifer authored Jun 24, 2020

40457bce

24 Jun, 2020 21 commits
- Tokenization tutorial (#5257) · d12ceb48
  Sylvain Gugger authored Jun 24, 2020
```
* All done

* Link to the tutorial

* Typo fixes
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
```
  d12ceb48
- Add more tests on tokenizers serialization - fix bugs (#5056) · 7ac91107
  Thomas Wolf authored Jun 24, 2020
```
* update tests for fast tokenizers + fix small bug in saving/loading

* better tests on serialization

* fixing serialization

* comment cleanup
```
  7ac91107
- Fix first test (#5255) · 0148c262
  Sylvain Gugger authored Jun 24, 2020
  
  0148c262
- Use master _static (#5253) · 70c1e1d2
  Sylvain Gugger authored Jun 24, 2020
```
* Use _static from master everywhere

* Copy to existing too
```
  70c1e1d2
- [HANS] Fix label_list for RoBERTa/BART (class flipping) (#5196) · 4965aee0
  Victor SANH authored Jun 24, 2020
```
* fix weirdness in roberta/bart for mnli trained checkpoints

* black compliance

* isort code check
```
  4965aee0
- [HfApi] Add support for pipeline_tag · fc24a93e
  Julien Chaumond authored Jun 24, 2020
  
  fc24a93e
- Replace labels with -100 to skip loss calc (#4718) · 0a3d0e02
  Setu Shah authored Jun 24, 2020
  
  0a3d0e02
- Fix version controller links (for realsies) (#5251) · 6894b486
  Sylvain Gugger authored Jun 24, 2020
  
  6894b486
- Model cards for Hate-speech-CNERG models (#5236) · 1121ce9f
  Sai Saketh Aluru authored Jun 24, 2020
```
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card

* model cards for Hate-speech-CNERG models
```
  1121ce9f
- Cleaning TensorFlow models (#5229) · cf10d4cf
  Lysandre Debut authored Jun 24, 2020
```
* Cleaning TensorFlow models

Update all classes


stylr

* Don't average loss
```
  cf10d4cf
- Fix links (#5248) · 609e0c58
  Sylvain Gugger authored Jun 24, 2020
  
  609e0c58
- delay decay schedule until the end of warmup (#4940) · c9163a8d
  Ali Modarressi authored Jun 24, 2020
  
  c9163a8d
- Fix deploy doc (#5246) · f216b606
  Sylvain Gugger authored Jun 24, 2020
```
* Try with the same command

* Try like this
```
  f216b606
- Add some prints to debug (#5244) · 49f6e7a3
  Sylvain Gugger authored Jun 24, 2020
  
  49f6e7a3
- [Use cache] Align logic of `use_cache` with output_attentions and output_hidden_states (#5194) · c2a26ec8
  Patrick von Platen authored Jun 24, 2020
```
* fix use cache

* add bart use cache

* fix bart

* finish bart
```
  c2a26ec8
- Don't recreate old docs (#5243) · 64c393ee
  Sylvain Gugger authored Jun 24, 2020
  
  64c393ee
- fix print in benchmark (#5242) · b2968373
  Patrick von Platen authored Jun 24, 2020
  
  b2968373
- [Benchmark] Extend Benchmark to all model type extensions (#5241) · 9fe09cec
  Patrick von Platen authored Jun 24, 2020
```
* add benchmark for all kinds of models

* improved import

* delete bogus files

* make style
```
  9fe09cec
- Add hugs (#5225) · 7c41057d
  Sylvain Gugger authored Jun 24, 2020
  
  7c41057d
- Use the script in utils (#5224) · 5e85b324
  Sylvain Gugger authored Jun 24, 2020
  
  5e85b324
- Create README.md (#5108) · 5e31a98a
  flozi00 authored Jun 24, 2020
```
* Create README.md

* Update model_cards/a-ware/roberta-large-squad-classification/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  5e31a98a