- 22 Dec, 2019 4 commits
-
-
Aymeric Augustin authored
This is the same change as for (TF)CommonTestCases for modeling.
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This is the result of: $ isort --recursive examples templates transformers utils hubconf.py setup.py
-
- 21 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
This is the result of: $ black --line-length 119 examples templates transformers utils hubconf.py setup.py There's a lot of fairly long lines in the project. As a consequence, I'm picking the longest widely accepted line length, 119 characters. This is also Thomas' preference, because it allows for explicit variable names, to make the code easier to understand.
-
- 20 Dec, 2019 2 commits
- 13 Dec, 2019 1 commit
-
-
LysandreJik authored
-
- 06 Dec, 2019 2 commits
-
-
Michael Watkins authored
-
Aymeric Augustin authored
* Switch to plain unittest for skipping slow tests. Add a RUN_SLOW environment variable for running them. * Switch to plain unittest for PyTorch dependency. * Switch to plain unittest for TensorFlow dependency. * Avoid leaking open files in the test suite. This prevents spurious warnings when running tests. * Fix unicode warning on Python 2 when running tests. The warning was: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal * Support running PyTorch tests on a GPU. Reverts 27e015bd. * Tests no longer require pytest. * Make tests pass on cuda
-
- 04 Dec, 2019 1 commit
-
-
LysandreJik authored
-
- 22 Nov, 2019 2 commits
-
-
LysandreJik authored
-
LysandreJik authored
-
- 12 Nov, 2019 2 commits
-
-
Lysandre authored
-
Michael Watkins authored
As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545
-
- 04 Nov, 2019 1 commit
-
-
thomwolf authored
-
- 22 Oct, 2019 1 commit
-
-
Lysandre authored
-
- 04 Oct, 2019 2 commits
- 03 Oct, 2019 5 commits
-
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
- 26 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 24 Sep, 2019 4 commits
-
-
thomwolf authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
- 19 Sep, 2019 9 commits
-
-
LysandreJik authored
-
LysandreJik authored
prepare_for_model and prepare_pair_for_model methods. Added an option to select which sequence will be truncated.
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
- 05 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 02 Sep, 2019 1 commit
-
-
thomwolf authored
-