Commits · 8bcb37bfb80d77e06001f989ad982c9961a69c31 · chenpangpang / transformers

27 Feb, 2020 5 commits

NER support for Albert in run_ner.py and NerPipeline (#2983) · 8bcb37bf

Lysandre Debut authored Feb 27, 2020

* * Added support for Albert when fine-tuning for NER

* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens

* Added ,

* Now passes style guide enforcement

* Changes from reviews.

* Code now passes style enforcement

* Added test for AlbertForTokenClassification

* Added test for AlbertForTokenClassification

8bcb37bf

spelling: strictly (#3042) · 6a375880
Sam Shleifer authored Feb 27, 2020

6a375880
Fix batch_encode_plus (#3041) · f4ff44a6
Cola authored Feb 27, 2020

f4ff44a6
Added test for AlbertForTokenClassification · f7115752
Martin Malmsten authored Feb 27, 2020

f7115752
Added test for AlbertForTokenClassification · aceb6a09
Martin Malmsten authored Feb 27, 2020

aceb6a09

26 Feb, 2020 8 commits
- Code now passes style enforcement · d762d428
  Martin Malmsten authored Feb 26, 2020
  
  d762d428
- Changes from reviews. · 9495d38b
  Martin Malmsten authored Feb 26, 2020
  
  9495d38b
- [gpu] Fixup fdd61b19 · b370cc7e
  Julien Chaumond authored Feb 26, 2020
  
  b370cc7e
- Fix bart slow test · f5516805
  Julien Chaumond authored Feb 26, 2020
  
  f5516805
- fix several typos in Distil* readme (#3034) · 5bc99e7f
  Andrew Walker authored Feb 26, 2020
  
  5bc99e7f
- Fix attn mask gpt2 when using past (#3033) · fdd61b19
  Patrick von Platen authored Feb 26, 2020
```
* fix issue and add some tests

* fix issue and add some tests

* updated doc string gpt2
```
  fdd61b19
- Fix (non-slow) tests on GPU (torch) (#3024) · 9cda3620
  Julien Chaumond authored Feb 26, 2020
```
* Fix tests on GPU (torch)

* Fix bart slow tests
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  9cda3620
- Delete all mentions of Model2Model (#3019) · 9df74b8b
  Sam Shleifer authored Feb 26, 2020
  
  9df74b8b
25 Feb, 2020 7 commits

Documentation (#2989) · bb7c4685

Lysandre Debut authored Feb 25, 2020

* All Tokenizers

BertTokenizer + few fixes
RobertaTokenizer
OpenAIGPTTokenizer + Fixes
GPT2Tokenizer + fixes
TransfoXLTokenizer
Correct rst for TransformerXL
XLMTokenizer + fixes
XLNet Tokenizer + Style
DistilBERT + Fix XLNet RST
CTRLTokenizer
CamemBERT Tokenizer
FlaubertTokenizer
XLMRobertaTokenizer
cleanup

* cleanup

bb7c4685

Add integration tests for xlm roberta modelling and xlm roberta tokenzier (#3014) · c913eb9c
Patrick von Platen authored Feb 25, 2020
```
* add first files

* add xlm roberta integration tests

* make style

* flake 8 issues solved
```
c913eb9c
Change masking to direct labeling for TPU support. (#2982) · e8ce63ff
srush authored Feb 25, 2020
```
* change masking to direct labelings

* fix black

* switch to ignore index

* .

* fix black
```
e8ce63ff
missing ner link (#2967) · 7a7ee28c
Jhuo IH authored Feb 25, 2020

7a7ee28c

Adding usage examples for common tasks (#2850) · 65e7c90a

Lysandre Debut authored Feb 25, 2020

* Usage: Sequence Classification & Question Answering

* Pipeline example

* Language modeling

* TensorFlow code for Sequence classification

* Custom TF/PT toggler in docs

* QA + LM for TensorFlow

* Finish Usage for both PyTorch and TensorFlow

* Addressing Julien's comments

* More assertive

* cleanup

* Favicon
- added favicon option in conf.py along with the favicon image
- udpated 🤗

 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)
Co-authored-by: joshchagani <joshua@joshuachagani.com>

65e7c90a

[ci] Run slow tests every day · e693cd1e
Julien Chaumond authored Feb 24, 2020

e693cd1e
[ci] Attempt to fix #2844 · 4fc63151
Julien Chaumond authored Feb 24, 2020

4fc63151

24 Feb, 2020 13 commits

Test correct tokenizers after default switch (#3003) · b90745c5
Lysandre Debut authored Feb 24, 2020

b90745c5
False by default (#3002) · 3716c3d8
Lysandre Debut authored Feb 24, 2020

3716c3d8
Release: v2.5.1 · f9ec5ca9
Lysandre authored Feb 24, 2020

f9ec5ca9

Fix for fast tokenizers save_pretrained compatibility with Python. (#2933) · 4cd9c097

Funtowicz Morgan authored Feb 25, 2020



* Renamed file generate by tokenizers when calling save_pretrained to match python.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added save_vocabulary tests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove python quick and dirty fix for clean Rust impl.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.5.1
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added some save_pretrained / from_pretrained unittests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update tokenizers to 0.5.2
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality and format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* flake8
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Making sure there is really a bug in unittest

* Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

4cd9c097

fix _update_memory fn call in transformer-xl (#2971) · ee60840e
Sandro Cavallari authored Feb 25, 2020

ee60840e
add explaining example to XLNet LM modeling (#2997) · 6a50d501
Patrick von Platen authored Feb 24, 2020
```
* add explaining example to XLNet LM modeling

* improve docstring for xlnet
```
6a50d501

Add preprocessing step for transfo-xl tokenization to avoid tokenizing words... · 65d74c49

Patrick von Platen authored Feb 24, 2020

Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987)

* add preprocessing to add space before punctuation for transfo_xl

* improve warning messages

* make style

* compile regex at instantination of tokenizer object

65d74c49

Add local_files_only parameter to pretrained items (#2930) · a143d947

Bram Vanroy authored Feb 24, 2020

* Add disable_outgoing to pretrained items

Setting disable_outgoing=True disables outgonig traffic:
- etags are not looked up
- models are not downloaded

* parameter name change

* Remove forgotten print

a143d947

Create README.md · 286d1ec7
Manuel Romero authored Feb 24, 2020

286d1ec7
kwargs are passed to both model and configuration in AutoModels (#2998) · 7984a70e
Lysandre Debut authored Feb 24, 2020

7984a70e

Testing that batch_encode_plus is the same as encode_plus (#2973) · 21d8b6a3

Lysandre Debut authored Feb 24, 2020

* Testing that encode_plus and batch_encode_plus behave the same way

Spoiler alert: they don't

* Testing rest of arguments in batch_encode_plus

* Test tensor return in batch_encode_plus

* Addressing Sam's comments

* flake8

* Simplified with `num_added_tokens`

21d8b6a3

Add slow generate tests for pretrained lm models (#2909) · 17c45c39

Patrick von Platen authored Feb 24, 2020

* add slow generate lm_model tests

* fix conflicts

* merge conflicts

* fix conflicts

* add slow generate lm_model tests

* make style

* delete unused variable

* fix conflicts

* fix conflicts

* fix conflicts

* delete unused variable

* fix conflicts

* finished hard coded tests

17c45c39

Warning on `add_special_tokens` (#2966) · 8194df8e

Lysandre Debut authored Feb 24, 2020

Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`

8194df8e

23 Feb, 2020 6 commits
- add_ctags_to_git_ignore (#2984) · 38f5fe9e
  Patrick von Platen authored Feb 23, 2020
  
  38f5fe9e
- Now passes style guide enforcement · 105dcb41
  Martin Malmsten authored Feb 23, 2020
  
  105dcb41
- Added , · 33eb8a16
  Martin Malmsten authored Feb 23, 2020
  
  33eb8a16
- * Added support for Albert when fine-tuning for NER · 869b66f6
  Martin Malmsten authored Feb 23, 2020
```
* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
```
  869b66f6
- Delete untested, broken Model2LSTM (#2968) · 129f0604
  Sam Shleifer authored Feb 23, 2020
  
  129f0604
- Correct `special_tokens_mask` when `add_special_tokens=False` (#2965) · 0e84559d
  Lysandre Debut authored Feb 23, 2020
```
Don't know of a use case where that would be useful, but this is more consistent
```
  0e84559d
22 Feb, 2020 1 commit
- Bart: fix layerdrop and cached decoder_input_ids for generation (#2969) · 92487a1d
  Sam Shleifer authored Feb 22, 2020
  
  92487a1d