- 25 Feb, 2020 6 commits
-
-
Patrick von Platen authored
* add first files * add xlm roberta integration tests * make style * flake 8 issues solved
-
srush authored
* change masking to direct labelings * fix black * switch to ignore index * . * fix black
-
Jhuo IH authored
-
Lysandre Debut authored
* Usage: Sequence Classification & Question Answering * Pipeline example * Language modeling * TensorFlow code for Sequence classification * Custom TF/PT toggler in docs * QA + LM for TensorFlow * Finish Usage for both PyTorch and TensorFlow * Addressing Julien's comments * More assertive * cleanup * Favicon - added favicon option in conf.py along with the favicon image - udpated
馃 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth) Co-authored-by:joshchagani <joshua@joshuachagani.com>
-
Julien Chaumond authored
-
Julien Chaumond authored
-
- 24 Feb, 2020 13 commits
-
-
Lysandre Debut authored
-
Lysandre Debut authored
-
Lysandre authored
-
Funtowicz Morgan authored
* Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Sandro Cavallari authored
-
Patrick von Platen authored
* add explaining example to XLNet LM modeling * improve docstring for xlnet
-
Patrick von Platen authored
Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987) * add preprocessing to add space before punctuation for transfo_xl * improve warning messages * make style * compile regex at instantination of tokenizer object
-
Bram Vanroy authored
* Add disable_outgoing to pretrained items Setting disable_outgoing=True disables outgonig traffic: - etags are not looked up - models are not downloaded * parameter name change * Remove forgotten print
-
Manuel Romero authored
-
Lysandre Debut authored
-
Lysandre Debut authored
* Testing that encode_plus and batch_encode_plus behave the same way Spoiler alert: they don't * Testing rest of arguments in batch_encode_plus * Test tensor return in batch_encode_plus * Addressing Sam's comments * flake8 * Simplified with `num_added_tokens`
-
Patrick von Platen authored
* add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests
-
Lysandre Debut authored
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
-
- 23 Feb, 2020 3 commits
-
-
Patrick von Platen authored
-
Sam Shleifer authored
-
Lysandre Debut authored
Don't know of a use case where that would be useful, but this is more consistent
-
- 22 Feb, 2020 6 commits
-
-
Sam Shleifer authored
-
Joe Davison authored
-
saippuakauppias authored
-
Malte Pietsch authored
Add image
-
Manuel Romero authored
- I added an example using the model with pipelines to show that we have set```{"use_fast": False}``` in the tokenizer. - I added a Colab to play with the model and pipelines - I added a Colab to discover Huggingface pipelines at the end of the document -
Funtowicz Morgan authored
* enable_padding should pad up to max_length if set. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added more testing on padding. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
- 21 Feb, 2020 7 commits
-
-
Lysandre Debut authored
-
Sam Shleifer authored
* Only use F.gelu for torch >=1.4.0 * Use F.gelu for newer torch
-
Patrick von Platen authored
* improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * changed fast random lm generation testing design to more general one * delete in old testing design in gpt2 * correct old variable name * temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed * adapted all fast random generate tests to new design * better warning description in modeling_utils * better comment * better comment and error message Co-authored-by:Thomas Wolf <thomwolf@users.noreply.github.com>
-
maximeilluin authored
* Added CamembertForQuestionAnswering * fixed camembert tokenizer case
-
Bram Vanroy authored
Tensorflow does not use .eval() vs .train(). closes https://github.com/huggingface/transformers/issues/2906
-
ahotrod authored
-
Martin Malmsten authored
-
- 20 Feb, 2020 5 commits
-
-
Sam Shleifer authored
* Results same as fairseq * Wrote a ton of tests * Struggled with api signatures * added some docs
-
guillaume-be authored
* Removed unused fields in DistilBert TransformerBlock
-
srush authored
-
Joe Davison authored
-
Scott Gigante authored
-