- 05 Mar, 2020 8 commits
-
-
patrickvonplaten authored
-
Sam Shleifer authored
* improved documentation
-
Lysandre Debut authored
* Pass kwargs to configuration * Setter * test
-
Lysandre Debut authored
-
sshleifer authored
-
sshleifer authored
-
Julien Chaumond authored
-
Lysandre authored
-
- 04 Mar, 2020 2 commits
-
-
Patrick von Platen authored
-
patrickvonplaten authored
-
- 03 Mar, 2020 3 commits
-
-
Sam Shleifer authored
-
Julien Chaumond authored
Adopted best practice set by @patrickvonplaten of commenting lines run on fairseq, for easy comparison also see #3020
-
Patrick von Platen authored
* add first copy past test to tf 2 generate * add tf top_k_top_p_filter fn * add generate function for TF * add generate function for TF * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * implemented generate for all models expect transfoXL * make style * change permission of test file to correct ones * delete ipdb * delete ipdb * fix bug and finish simple gpt2 integration test * clean test file * clean test file * make style * make style * make style * make style * change import style * change import style * make style * make style * add decorators * add decorators * fix tf ctrl bug dim => axis in TF * make style * make style * refactored test file * refactored test file * take out test_torch_tf_conversion if nothing is defined * take out test_torch_tf_conversion if nothing is defined * remove useless files * remove useless files * fix conflicts * fix conflicts * fix conflicts * fix conflicts * fix conflicts * solve conflicts * solve conflicts * fix conflicts * fix conflicts * merge conflicts * delete ipdb * exposed top_k_top_p_filtering fns * delete weirdly created w! file * add comment to test tf common modeling * fix conflicts * fix conflicts * make style * merge conflicts * make style * change tf.tensor.shape to shape_list(tensor)
-
- 02 Mar, 2020 6 commits
-
-
Julien Chaumond authored
* debug env * Restrict TF GPU memory * Fixup * One more test * rm debug logs * Fixup
-
Lysandre Debut authored
* Pipeline doc initial commit * pipeline abstraction * Remove modelcard argument from pipeline * Task-specific pipelines can be instantiated with no model or tokenizer * All pipelines doc
-
Julien Chaumond authored
cc @patrickvonplaten
-
Patrick von Platen authored
* correct greedy generation when doing beam search * improve comment
-
Patrick von Platen authored
* force pad_token_id to be set before padding * fix tests and forbid padding without having a padding_token_id set
-
Sam Shleifer authored
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
-
- 27 Feb, 2020 2 commits
-
-
Martin Malmsten authored
-
Martin Malmsten authored
-
- 26 Feb, 2020 5 commits
-
-
Julien Chaumond authored
-
Julien Chaumond authored
-
Patrick von Platen authored
* fix issue and add some tests * fix issue and add some tests * updated doc string gpt2
-
Julien Chaumond authored
* Fix tests on GPU (torch) * Fix bart slow tests Co-authored-by:Sam Shleifer <sshleifer@gmail.com>
-
Sam Shleifer authored
-
- 25 Feb, 2020 3 commits
-
-
Patrick von Platen authored
* add first files * add xlm roberta integration tests * make style * flake 8 issues solved
-
Patrick von Platen authored
-
Patrick von Platen authored
-
- 24 Feb, 2020 4 commits
-
-
Lysandre Debut authored
-
Funtowicz Morgan authored
* Renamed file generate by tokenizers when calling save_pretrained to match python. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added save_vocabulary tests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Remove python quick and dirty fix for clean Rust impl. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Bump tokenizers dependency to 0.5.1 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added some save_pretrained / from_pretrained unittests. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Update tokenizers to 0.5.2 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * flake8 Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Making sure there is really a bug in unittest * Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Lysandre Debut authored
* Testing that encode_plus and batch_encode_plus behave the same way Spoiler alert: they don't * Testing rest of arguments in batch_encode_plus * Test tensor return in batch_encode_plus * Addressing Sam's comments * flake8 * Simplified with `num_added_tokens`
-
Patrick von Platen authored
* add slow generate lm_model tests * fix conflicts * merge conflicts * fix conflicts * add slow generate lm_model tests * make style * delete unused variable * fix conflicts * fix conflicts * fix conflicts * delete unused variable * fix conflicts * finished hard coded tests
-
- 22 Feb, 2020 2 commits
-
-
Sam Shleifer authored
-
Funtowicz Morgan authored
* enable_padding should pad up to max_length if set. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Added more testing on padding. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
- 21 Feb, 2020 1 commit
-
-
Patrick von Platen authored
* improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * improving generation * finalized special token behaviour for no_beam_search generation * solved modeling_utils merge conflict * solve merge conflicts in modeling_utils.py * add run_generation improvements from PR #2749 * adapted language generation to not use hardcoded -1 if no padding token is available * remove the -1 removal as hard coded -1`s are not necessary anymore * add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown * add slow language generation tests for pretrained models using hardcoded output with pytorch seed * delete ipdb * check that all generated tokens are valid * renaming * renaming Generation -> Generate * make style * updated so that generate_beam_search has same token behavior than generate_no_beam_search * consistent return format for run_generation.py * deleted pretrain lm generate tests -> will be added in another PR * cleaning of unused if statements and renaming * run_generate will always return an iterable * make style * consistent renaming * improve naming, make sure generate function always returns the same tensor, add docstring * add slow tests for all lmhead models * make style and improve example comments modeling_utils * better naming and refactoring in modeling_utils * changed fast random lm generation testing design to more general one * delete in old testing design in gpt2 * correct old variable name * temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed * adapted all fast random generate tests to new design * better warning description in modeling_utils * better comment * better comment and error message Co-authored-by:Thomas Wolf <thomwolf@users.noreply.github.com>
-
- 20 Feb, 2020 2 commits
-
-
Sam Shleifer authored
* Results same as fairseq * Wrote a ton of tests * Struggled with api signatures * added some docs
-
Joe Davison authored
-
- 19 Feb, 2020 2 commits
-
-
Funtowicz Morgan authored
* Correctly return the tuple of generated file(s) when calling save_pretrained Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality and format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Funtowicz Morgan authored
* Override build_inputs_with_special_tokens for fast impl + unittest. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * Quality + format. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-