- 26 Nov, 2019 11 commits
- 14 Nov, 2019 1 commit
-
-
Rémi Louf authored
Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR class and passing a method of the wrapping class to the __init__ function of LambdaLR. This approach is not appropriate for several reasons: 1. one does not need to define a class when it only defines a __init__() method; 2. instantiating the parent class by passing a method of the child class creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134. In this commit we replace the wrapper classes with functions that instantiate `LambdaLR` with a custom learning rate function. We use a closure to specify the parameter of the latter. We also do a bit of renaming within the function to explicit the behaviour and removed docstrings that were subsequently not necessary.
-
- 12 Nov, 2019 7 commits
-
-
Lysandre authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Michael Watkins authored
As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545
-
- 11 Nov, 2019 1 commit
-
-
Stefan Schweter authored
-
- 06 Nov, 2019 2 commits
-
-
Julien Chaumond authored
-
-
- 05 Nov, 2019 1 commit
-
-
Julien Chaumond authored
-
- 04 Nov, 2019 5 commits
- 30 Oct, 2019 2 commits
- 24 Oct, 2019 1 commit
-
-
Matt Maybeno authored
[WIP] copy paste bert token classification for roberta
-
- 22 Oct, 2019 1 commit
-
-
Lysandre authored
-
- 16 Oct, 2019 1 commit
-
-
Rémi Louf authored
-
- 15 Oct, 2019 1 commit
-
-
thomwolf authored
-
- 11 Oct, 2019 3 commits
- 10 Oct, 2019 3 commits