Commits · a36f981d1bae5cae4ad9b0b5610f891caf852296 · chenpangpang / transformers

26 Nov, 2019 11 commits
- Improve test protocol for inputs_embeds in TF · 8742baa5
  Julien Chaumond authored Nov 26, 2019
  
  8742baa5
- Improve test protocol for inputs_embeds in TF · cf62bdc9
  Julien Chaumond authored Nov 26, 2019
```
cc @lysandrejik
```
  cf62bdc9
- Fix input embeddings · f2f32940
  Lysandre authored Nov 26, 2019
  
  f2f32940
- Tests for ALBERT in TF2 + fixes · b18509c2
  Lysandre authored Nov 08, 2019
  
  b18509c2
- Tests for AlbertForQuestionAnswering AlbertForSequenceClassification · 9d5c4954
  Lysandre authored Nov 07, 2019
  
  9d5c4954
- Headmasking · 16263f96
  Lysandre authored Nov 07, 2019
  
  16263f96
- Head pruning for ALBERT · abb23a78
  Lysandre authored Nov 07, 2019
  
  abb23a78
- ALBERT passes all tests · c14a2227
  Lysandre authored Oct 31, 2019
  
  c14a2227
- Early tests · 870320a2
  Lysandre authored Oct 30, 2019
  
  870320a2
- Several fixes and improvements · 1e5b31c3
  Lysandre authored Oct 30, 2019
  
  1e5b31c3
- Tokenization tests + fixes + init · ee20201d
  Lysandre authored Oct 30, 2019
  
  ee20201d
14 Nov, 2019 1 commit

replace LambdaLR scheduler wrappers by function · 022525b0

Rémi Louf authored Nov 12, 2019

Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR
class and passing a method of the wrapping class to the __init__
function of LambdaLR. This approach is not appropriate for several
reasons:

1. one does not need to define a class when it only defines a
__init__() method;
2. instantiating the parent class by passing a method of the child class
creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134.

In this commit we replace the wrapper classes with functions that
instantiate `LambdaLR` with a custom learning rate function. We use a
closure to specify the parameter of the latter. We also do a bit of
renaming within the function to explicit the behaviour and removed
docstrings that were subsequently not necessary.

022525b0

12 Nov, 2019 7 commits

Fix special tokens addition in decoder · 74d0bcb6
Lysandre authored Nov 12, 2019

74d0bcb6
[inputs_embeds] All TF models + tests · 155c782a
Julien Chaumond authored Nov 11, 2019

155c782a
[common attributes] Fix previous commit for transfo-xl · 2aef2f0b
Julien Chaumond authored Nov 11, 2019

2aef2f0b
[common attributes] Slightly sharper test coverage · 2f174642
Julien Chaumond authored Nov 11, 2019

2f174642
Ooopsie · 9d2398fd
Julien Chaumond authored Nov 11, 2019

9d2398fd
[TF models] Common attributes as per #1721 · 70d97ddd
Julien Chaumond authored Nov 11, 2019

70d97ddd

Consider do_lower_case in PreTrainedTokenizer · 7246d3c2

Michael Watkins authored Nov 06, 2019

As pointed out in #1545, when using an uncased model, and adding
a new uncased token, the tokenizer does not correctly identify this
in the case that the input text contains the token in a cased format.

For instance, if we load bert-base-uncased into BertTokenizer, and
then use .add_tokens() to add "cool-token", we get the expected
result for .tokenize('this is a cool-token'). However, we get a
possibly unexpected result for .tokenize('this is a cOOl-Token'),
which in fact mirrors the result for the former from before the new
token was added.

This commit adds
- functionality to PreTrainedTokenizer to handle this
situation in case a tokenizer (currently Bert, DistilBert,
and XLNet) has the do_lower_case=True kwarg by:
    1) lowercasing tokens added with .add_tokens()
    2) lowercasing text at the beginning of .tokenize()
- new common test case for tokenizers

https://github.com/huggingface/transformers/issues/1545

7246d3c2

11 Nov, 2019 1 commit
- tests: add test case for DistilBertForTokenClassification implementation · 94e55253
  Stefan Schweter authored Nov 11, 2019
  
  94e55253
06 Nov, 2019 2 commits
- [tests] Flag to test on cuda · 27e015bd
  Julien Chaumond authored Nov 06, 2019
  
  27e015bd
- [tests] get rid of warning · 13d9135f
  Julien Chaumond authored Nov 06, 2019
```
cf. https://docs.pytest.org/en/latest/example/simple.html
```
  13d9135f
05 Nov, 2019 1 commit
- [inputs_embeds] All PyTorch models · 00337e96
  Julien Chaumond authored Nov 05, 2019
  
  00337e96
04 Nov, 2019 5 commits
- fix #1532 and encode_plus · 8d6b9d71
  thomwolf authored Nov 04, 2019
  
  8d6b9d71
- fix tests - flagged as slow all the tests downloading from AWS · b340a910
  thomwolf authored Nov 04, 2019
  
  b340a910
- fix tests · f02805da
  thomwolf authored Nov 04, 2019
  
  f02805da
- switch from properties to methods · 1724cee8
  thomwolf authored Nov 04, 2019
  
  1724cee8
- Add common properties input_embeddings and output_embeddings · 9b45d0f8
  thomwolf authored Nov 04, 2019
  
  9b45d0f8
30 Oct, 2019 2 commits
- add tests to encoder-decoder model · a88a0e44
  Rémi Louf authored Oct 30, 2019
  
  a88a0e44
- update test on Bert to include decoder mode · 3f07cd41
  Rémi Louf authored Oct 30, 2019
  
  3f07cd41
24 Oct, 2019 1 commit
- RoBERTa token classification · 66085a13
  Matt Maybeno authored Oct 23, 2019
```
[WIP] copy paste bert token classification for roberta
```
  66085a13
22 Oct, 2019 1 commit
- Remove · 7d709e55
  Lysandre authored Oct 22, 2019
  
  7d709e55
16 Oct, 2019 1 commit
- remove Bert2Rnd test · 33c01368
  Rémi Louf authored Oct 16, 2019
  
  33c01368
15 Oct, 2019 1 commit
- add tests on TF2.0 & PT checkpoint => model convertion functions · 898ce064
  thomwolf authored Oct 15, 2019
  
  898ce064
11 Oct, 2019 3 commits
- no nans · 18a3cef7
  thomwolf authored Oct 11, 2019
  
  18a3cef7
- fix test · 1f5d9513
  thomwolf authored Oct 11, 2019
  
  1f5d9513
- adding option to desactivate past/memory outputs · 0f9fc4fb
  thomwolf authored Oct 11, 2019
  
  0f9fc4fb
10 Oct, 2019 3 commits
- add test for initialization of Bert2Rnd · 1e68c286
  Rémi Louf authored Oct 10, 2019
  
  1e68c286
- adding more tests on TF and pytorch serialization - updating configuration for better serialization · da26bae6
  thomwolf authored Oct 10, 2019
  
  da26bae6
- Add tests that TF 2.0 model can be integrated with other Keras modules · bb04edb4
  thomwolf authored Oct 10, 2019
  
  bb04edb4