- 05 Mar, 2020 2 commits
-
-
Tom Hosking authored
-
Lysandre authored
-
- 04 Mar, 2020 6 commits
-
-
Patrick von Platen authored
* fix conflits * fixed naming bug * make style
-
Patrick von Platen authored
-
Patrick von Platen authored
-
patrickvonplaten authored
-
patrickvonplaten authored
-
Patrick von Platen authored
* fix beam_search behavior when sampling * delete print * make correct style
-
- 03 Mar, 2020 5 commits
-
-
Gunnlaugur Thor Briem authored
Lurking bugs discovered while working on other stuff.
-
Sam Shleifer authored
-
Julien Chaumond authored
Adopted best practice set by @patrickvonplaten of commenting lines run on fairseq, for easy comparison also see #3020
-
Sam Shleifer authored
-
Patrick von Platen authored
* add first copy past test to tf 2 generate * add tf top_k_top_p_filter fn * add generate function for TF * add generate function for TF * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * make style * change permission of test file to correct ones * delete ipdb * delete ipdb * fix bug and finish simple gpt2 integration test * clean test file * clean test file * make style * make style * make style * make style * change import style * change import style * make style * make style * add decorators * add decorators * fix tf ctrl bug dim => axis in TF * make style * make style * refactored test file * refactored test file * take out test_torch_tf_conversion if nothing is defined * take out test_torch_tf_conversion if nothing is defined * remove useless files * remove useless files * fix conflicts * fix conflicts * fix conflicts * fix conflicts * fix conflicts * solve conflicts * solve conflicts * fix conflicts * fix conflicts * merge conflicts * delete ipdb * exposed top_k_top_p_filtering fns * delete weirdly created w! file * add comment to test tf common modeling * fix conflicts * fix conflicts * make style * merge conflicts * make style * change tf.tensor.shape to shape_list(tensor)
-
- 02 Mar, 2020 5 commits
-
-
Julien Chaumond authored
cc @sshleifer
-
Lysandre Debut authored
* Pipeline doc initial commit * pipeline abstraction * Remove modelcard argument from pipeline * Task-specific pipelines can be instantiated with no model or tokenizer * All pipelines doc
-
Patrick von Platen authored
* correct greedy generation when doing beam search * improve comment
-
Patrick von Platen authored
* force pad_token_id to be set before padding * fix tests and forbid padding without having a padding_token_id set
-
Sam Shleifer authored
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
-
- 27 Feb, 2020 2 commits
-
-
Sam Shleifer authored
-
Cola authored
-
- 26 Feb, 2020 4 commits
-
-
Martin Malmsten authored
-
Patrick von Platen authored
* fix issue and add some tests * fix issue and add some tests * updated doc string gpt2
-
Julien Chaumond authored
* Fix tests on GPU (torch) * Fix bart slow tests Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Sam Shleifer authored
-
- 25 Feb, 2020 2 commits
-
-
Lysandre Debut authored
* All Tokenizers BertTokenizer + few fixes RobertaTokenizer OpenAIGPTTokenizer + Fixes GPT2Tokenizer + fixes TransfoXLTokenizer Correct rst for TransformerXL XLMTokenizer + fixes XLNet Tokenizer + Style DistilBERT + Fix XLNet RST CTRLTokenizer CamemBERT Tokenizer FlaubertTokenizer XLMRobertaTokenizer cleanup * cleanup
-
srush authored
* change masking to direct labelings * fix black * switch to ignore index * . * fix black
-
- 24 Feb, 2020 11 commits
-
-
Lysandre Debut authored
-
Lysandre authored
-
Funtowicz Morgan authored
* Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Sandro Cavallari authored
-
Patrick von Platen authored
* add explaining example to XLNet LM modeling * improve docstring for xlnet
-
Patrick von Platen authored
Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987) * add preprocessing to add space before punctuation for transfo_xl * improve warning messages * make style * compile regex at instantination of tokenizer object
-
Bram Vanroy authored
* Add disable_outgoing to pretrained items Setting disable_outgoing=True disables outgonig traffic: - etags are not looked up - models are not downloaded * parameter name change * Remove forgotten print
-
Lysandre Debut authored
-
Lysandre Debut authored
* Testing that encode_plus and batch_encode_plus behave the same way Spoiler alert: they don't * Testing rest of arguments in batch_encode_plus * Test tensor return in batch_encode_plus * Addressing Sam's comments * flake8 * Simplified with `num_added_tokens`
-
Patrick von Platen authored
* add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests
-
Lysandre Debut authored
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
-
- 23 Feb, 2020 3 commits
-
-
Martin Malmsten authored
* Added support for Albert in NER pipeline * Added command-line options to examples/ner/run_ner.py to better control tokenization * Added class AlbertForTokenClassification * Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
-
Sam Shleifer authored
-
Lysandre Debut authored
Don't know of a use case where that would be useful, but this is more consistent
-