Commits · a59bcefbb1cf834353bd1177f32edfbc95dd4279 · chenpangpang / transformers

31 Aug, 2020 10 commits

Split hp search methods (#6857) · a59bcefb
Sylvain Gugger authored Aug 31, 2020
```
* Split the run_hp_search by backend

* Unused import
```
a59bcefb

Add checkpointing to Ray Tune HPO (#6747) · 23f9611c

krfricke authored Aug 31, 2020

* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations

23f9611c

Marian distill scripts + integration test (#6799) · 61b7ba93
Sam Shleifer authored Aug 31, 2020

61b7ba93

Only access loss tensor every logging_steps (#6802) · 02d09c8f

Jin Young (Daniel) Sohn authored Aug 31, 2020



* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

02d09c8f

Fix resuming training for Windows (#6847) · c48546c7
Sylvain Gugger authored Aug 31, 2020

c48546c7
Fix in Adafactor docstrings (#6845) · d2f9cb83
Sylvain Gugger authored Aug 31, 2020

d2f9cb83

Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644) · 2de7ee03

Huang Lianzhe authored Aug 31, 2020



* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

2de7ee03

TF Flaubert w/ pre-norm (#6841) · 895d3946
Lysandre Debut authored Aug 31, 2020

895d3946
Set default logging level to `WARNING` instead of `INFO` · 4561f05c
Lysandre authored Aug 31, 2020

4561f05c
Patch logging issue · 05c32141
Lysandre authored Aug 31, 2020

05c32141

30 Aug, 2020 6 commits
- [s2s README] Add more dataset download instructions (#6737) · dfa10a41
  Sam Shleifer authored Aug 30, 2020
  
  dfa10a41
- clearly indicate shuffle=False (#6312) · 32fe4408
  xujiaze13 authored Aug 30, 2020
```
* Clarify shuffle

* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
```
  32fe4408
- BR_BERTo model card (#6793) · 0eecacea
  Rodolfo De Nadai authored Aug 30, 2020
  
  0eecacea
- Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) · d176aaad
  Zane Lim authored Aug 30, 2020
  
  d176aaad
- Fixed open in colab link (#6825) · a5847619
  Thomas Ashish Cherian authored Aug 30, 2020
  
  a5847619
- [tests] fix typos in inputs (#6818) · 563485bf
  Stas Bekman authored Aug 30, 2020
  
  563485bf
29 Aug, 2020 3 commits
- [bart] rename self-attention -> attention (#6708) · 22933e66
  Sam Shleifer authored Aug 29, 2020
  
  22933e66
- Pegasus finetune script: add --adafactor (#6811) · 0f58903b
  Sam Shleifer authored Aug 29, 2020
  
  0f58903b
- [s2s] round runtime in run_eval (#6798) · ac47458a
  Sam Shleifer authored Aug 29, 2020
  
  ac47458a
28 Aug, 2020 9 commits

[s2s] Test hub configs in self-scheduled CI (#6809) · 5ab21b07
Sam Shleifer authored Aug 28, 2020

5ab21b07
t5 model should make decoder_attention_mask (#6800) · 3cac867f
Sam Shleifer authored Aug 28, 2020

3cac867f
Fix style (#6803) · 20f77864
Sam Shleifer authored Aug 28, 2020

20f77864

prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a

Sam Shleifer authored Aug 28, 2020

* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc

9336086a

Transformer-XL: Improved tokenization with sacremoses (#6322) · cb276b41

RafaelWO authored Aug 28, 2020



* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

cb276b41

Add ProtBert model card (#6764) · 930153e7
Ahmed Elnaggar authored Aug 28, 2020

930153e7

[style] set the minimal required version for `black` (#6784) · 743d131d

Stas Bekman authored Aug 27, 2020

`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion

743d131d

PL: --adafactor option (#6776) · fb78a90d
Sam Shleifer authored Aug 27, 2020

fb78a90d
[transformers-cli] fix logger getter (#6777) · 92ac2fa7
Stas Bekman authored Aug 27, 2020

92ac2fa7

27 Aug, 2020 12 commits

Format · 42fddacd
Lysandre authored Aug 27, 2020

42fddacd

new Makefile target: docs (#6510) · 70fccc5c

Stas Bekman authored Aug 27, 2020

* [doc] multiple corrections to "Summary of the tasks"

* add a new "docs" target to validate docs and document it

* fix mixup

70fccc5c

[test schedulers] adjust to test the first step's reading (#6429) · dbfe34f2
Stas Bekman authored Aug 27, 2020
```
* [test schedulers] small improvement

* cleanup
```
dbfe34f2
[testing] replace hardcoded paths to allow running tests from anywhere (#6523) · e6b811f0
Stas Bekman authored Aug 27, 2020
```
* [testing] replace hardcoded paths to allow running tests from anywhere

* fix the merge conflict
```
e6b811f0
add nlp install (#6767) · 9d1b4db2
Sam Shleifer authored Aug 27, 2020

9d1b4db2
Fix it to work with BART (#6756) · c225e872
Tom Grek authored Aug 27, 2020

c225e872
Format · 0d2c111a
Lysandre authored Aug 27, 2020

0d2c111a

Fix the TF Trainer gradient accumulation and the TF NER example (#6713) · 6f289dc9

Julien Plu authored Aug 27, 2020

* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style

6f289dc9

Adafactor docs (#6765) · 41aa2b4e
Lysandre Debut authored Aug 27, 2020

41aa2b4e

Add AdaFactor optimizer from fairseq (#6722) · 971d1802

Nikolai Yakovenko authored Aug 27, 2020



* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

* update PR fixes, add basic test

* bug -- incorrect params in test

* bugfix -- import Adafactor into test

* bugfix -- removed accidental T5 include

* resetting T5 to master

* bugfix -- include Adafactor in __init__

* longer loop for adafactor test

* remove double error class declare

* lint

* black

* isort

* Update src/transformers/optimization.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* single docstring

* Cleanup docstring
Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

971d1802

s2s distillation uses AutoModelForSeqToSeqLM (#6761) · 4bd7be9a
Sam Shleifer authored Aug 26, 2020

4bd7be9a
create ProtBert-BFD model card. (#6724) · 05e7150a
Ahmed Elnaggar authored Aug 27, 2020

05e7150a