- 12 Nov, 2019 2 commits
-
-
Michael Watkins authored
As pointed out in #1545, when using an uncased model, and adding a new uncased token, the tokenizer does not correctly identify this in the case that the input text contains the token in a cased format. For instance, if we load bert-base-uncased into BertTokenizer, and then use .add_tokens() to add "cool-token", we get the expected result for .tokenize('this is a cool-token'). However, we get a possibly unexpected result for .tokenize('this is a cOOl-Token'), which in fact mirrors the result for the former from before the new token was added. This commit adds - functionality to PreTrainedTokenizer to handle this situation in case a tokenizer (currently Bert, DistilBert, and XLNet) has the do_lower_case=True kwarg by: 1) lowercasing tokens added with .add_tokens() 2) lowercasing text at the beginning of .tokenize() - new common test case for tokenizers https://github.com/huggingface/transformers/issues/1545 -
thomwolf authored
-
- 11 Nov, 2019 1 commit
-
-
Lysandre authored
-
- 08 Nov, 2019 1 commit
-
-
Adrian Bauer authored
Make imports and args.overwrite_cache match run_glue.py
-
- 06 Nov, 2019 7 commits
-
-
Julien Chaumond authored
converted from https://github.com/openai/gpt-2-output-dataset/tree/master/detector Co-Authored-By:
Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-Authored-By:
Jong Wook Kim <jongwook@nyu.edu> Co-Authored-By:
Jeff Wu <wuthefwasthat@gmail.com>
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
-
Julien Chaumond authored
-
Julien Chaumond authored
-
- 05 Nov, 2019 11 commits
-
-
Dom Hudson authored
-
Lysandre authored
-
Julien Chaumond authored
model forwards can take an inputs_embeds param
-
Clement authored
-
Thomas Wolf authored
add progress bar to convert_examples_to_features
-
Thomas Wolf authored
Add speed log to examples/run_squad.py
-
Thomas Wolf authored
Fixing mode in evaluate during training
-
thomwolf authored
-
Oren Amsalem authored
It takes considerate amount of time (~10 min) to parse the examples to features, it is good to have a progress-bar to track this
-
Thomas Wolf authored
Fix #1623
-
Julien Chaumond authored
-
- 04 Nov, 2019 12 commits
-
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Lysandre authored
Co-Authored-By:Santosh-Gupta <San.Gupta.ML@gmail.com>
-
Thomas Wolf authored
Add common getter and setter for input_embeddings & output_embeddings
-
thomwolf authored
-
thomwolf authored
-
thomwolf authored
-
Thomas Wolf authored
Fix token order in xlnet preprocessing for SQuAD
-
thomwolf authored
-
thomwolf authored
-
Thomas Wolf authored
-
thomwolf authored
-
- 03 Nov, 2019 1 commit
-
-
Raghavan authored
-
- 01 Nov, 2019 2 commits
-
-
-
Julien Chaumond authored
-
- 31 Oct, 2019 3 commits
-
-
Lysandre authored
-
Timothy Liu authored
-
Julien Chaumond authored
-