- 27 Feb, 2020 5 commits
-
-
Lysandre Debut authored
* * Added support for Albert when fine-tuning for NER * Added support for Albert in NER pipeline * Added command-line options to examples/ner/run_ner.py to better control tokenization * Added class AlbertForTokenClassification * Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens * Added , * Now passes style guide enforcement * Changes from reviews. * Code now passes style enforcement * Added test for AlbertForTokenClassification * Added test for AlbertForTokenClassification
-
Sam Shleifer authored
-
Cola authored
-
Martin Malmsten authored
-
Martin Malmsten authored
-
- 26 Feb, 2020 8 commits
-
-
Martin Malmsten authored
-
Martin Malmsten authored
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Andrew Walker authored
-
Patrick von Platen authored
* fix issue and add some tests * fix issue and add some tests * updated doc string gpt2
-
Julien Chaumond authored
* Fix tests on GPU (torch) * Fix bart slow tests Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Sam Shleifer authored
-
- 25 Feb, 2020 7 commits
-
-
Lysandre Debut authored
* All Tokenizers BertTokenizer + few fixes RobertaTokenizer OpenAIGPTTokenizer + Fixes GPT2Tokenizer + fixes TransfoXLTokenizer Correct rst for TransformerXL XLMTokenizer + fixes XLNet Tokenizer + Style DistilBERT + Fix XLNet RST CTRLTokenizer CamemBERT Tokenizer FlaubertTokenizer XLMRobertaTokenizer cleanup * cleanup
-
Patrick von Platen authored
* add first files * add xlm roberta integration tests * make style * flake 8 issues solved
-
srush authored
* change masking to direct labelings * fix black * switch to ignore index * . * fix black
-
Jhuo IH authored
-
Lysandre Debut authored
* Usage: Sequence Classification & Question Answering * Pipeline example * Language modeling * TensorFlow code for Sequence classification * Custom TF/PT toggler in docs * QA + LM for TensorFlow * Finish Usage for both PyTorch and TensorFlow * Addressing Julien's comments * More assertive * cleanup * Favicon - added favicon option in conf.py along with the favicon image - udpated
馃 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth) Co-authored-by:joshchagani <joshua@joshuachagani.com>
-
Julien Chaumond authored
-
Julien Chaumond authored
-
- 24 Feb, 2020 13 commits
-
-
Lysandre Debut authored
-
Lysandre Debut authored
-
Lysandre authored
-
Funtowicz Morgan authored
* Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Sandro Cavallari authored
-
Patrick von Platen authored
* add explaining example to XLNet LM modeling * improve docstring for xlnet
-
Patrick von Platen authored
Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987) * add preprocessing to add space before punctuation for transfo_xl * improve warning messages * make style * compile regex at instantination of tokenizer object
-
Bram Vanroy authored
* Add disable_outgoing to pretrained items Setting disable_outgoing=True disables outgonig traffic: - etags are not looked up - models are not downloaded * parameter name change * Remove forgotten print
-
Manuel Romero authored
-
Lysandre Debut authored
-
Lysandre Debut authored
* Testing that encode_plus and batch_encode_plus behave the same way Spoiler alert: they don't * Testing rest of arguments in batch_encode_plus * Test tensor return in batch_encode_plus * Addressing Sam's comments * flake8 * Simplified with `num_added_tokens`
-
Patrick von Platen authored
* add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests
-
Lysandre Debut authored
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
-
- 23 Feb, 2020 6 commits
-
-
Patrick von Platen authored
-
Martin Malmsten authored
-
Martin Malmsten authored
-
Martin Malmsten authored
* Added support for Albert in NER pipeline * Added command-line options to examples/ner/run_ner.py to better control tokenization * Added class AlbertForTokenClassification * Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
-
Sam Shleifer authored
-
Lysandre Debut authored
Don't know of a use case where that would be useful, but this is more consistent
-
- 22 Feb, 2020 1 commit
-
-
Sam Shleifer authored
-