- 06 Jul, 2020 1 commit
-
-
Anthony MOI authored
* BertTokenizerFast - Do not specify strip_accents by default * Bump tokenizers to new version * Add test for AddedToken serialization
-
- 03 Jul, 2020 2 commits
-
-
Sam Shleifer authored
-
Lysandre Debut authored
* Exposing prepare_for_model for both slow & fast tokenizers * Update method signature * The traditional style commit * Hide the warnings behind the verbose flag * update default truncation strategy and prepare_for_model * fix tests and prepare_for_models methods Co-authored-by:Thomas Wolf <thomwolf@users.noreply.github.com>
-
- 02 Jul, 2020 1 commit
-
-
Teven authored
* Changed expected_output_ids in TransfoXL generation test to match #4826 generation PR. * making black happy * making isort happy
-
- 01 Jul, 2020 8 commits
-
-
Patrick von Platen authored
* fix conflicts * fix * happy rebasing
-
Joe Davison authored
* allow tensor label inputs to default collator * replace try/except with type check
-
Patrick von Platen authored
-
Patrick von Platen authored
* refactor naming * add small slow test * refactor * refactor naming * rename selected to extra * big global attention refactor * make style * refactor naming * save intermed * refactor functions * finish function refactor * fix tests * fix longformer * fix longformer * fix longformer * fix all tests but one * finish longformer * address sams and izs comments * fix transpose
-
Sam Shleifer authored
-
Funtowicz Morgan authored
* Added PipelineException Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * fill-mask pipeline raises exception when more than one mask_token detected. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Put everything in a function. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Added tests on pipeline fill-mask when input has != 1 mask_token Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Fix numel() computation for TF Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Addressing PR comments. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Remove function typing to avoid import on specific framework. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Quality. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Retry typing with @julien-c tip. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Quality虏. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Simplify fill-mask mask_token checking. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Trigger CI
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 30 Jun, 2020 1 commit
-
-
Sam Shleifer authored
-
- 29 Jun, 2020 1 commit
-
-
Patrick von Platen authored
* first doc version * add benchmark docs * fix typos * improve README * Update docs/source/benchmarks.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * fix naming and docs Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 28 Jun, 2020 1 commit
-
-
Sam Shleifer authored
-
- 26 Jun, 2020 4 commits
-
-
Sam Shleifer authored
-
Thomas Wolf authored
* remove references to old API in docstring - update data processors * style * fix tests - better type checking error messages * better type checking * include awesome fix by @LysandreJik for #5310 * updated doc and examples
-
Sam Shleifer authored
-
Funtowicz Morgan authored
* Add new parameter `pad_to_multiple_of` on tokenizers. * unittest for pad_to_multiple_of * Add .name when logging enum. * Fix missing .items() on dict in tests. * Add special check + warning if the tokenizer doesn't have proper pad_token. * Use the correct logger format specifier. * Ensure tokenizer with no pad_token do not modify the underlying padding strategy. * Skip test if tokenizer doesn't have pad_token * Fix RobertaTokenizer on empty input * Format. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * fix and updating to simpler API Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com>
-
- 25 Jun, 2020 3 commits
-
-
Lysandre Debut authored
* Refactor code samples * Test docstrings * Style * Tokenization examples * Run rust of tests * First step to testing source docs * Style and BART comment * Test the remainder of the code samples * Style * let to const * Formatting fixes * Ready for merge * Fix fixture + Style * Fix last tests * Update docs/source/quicktour.rst Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Addressing @sgugger's comments + Fix MobileBERT in TF Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Thomas Wolf authored
* avoid recursion in id checks for fast tokenizers * better typings and fix #5232 * align slow and fast tokenizers behaviors for Roberta and GPT2 * style and quality * fix tests - improve typings
-
Thomas Wolf authored
[Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252) * fix-5181 Padding to max sequence length while truncation to another length was wrong on slow tokenizers * clean up and fix #5155 * fix XLM test * Fix tests for Transfo-XL * logging only above WARNING in tests * switch slow tokenizers tests in @slow * fix Marian truncation tokenization test * style and quality * make the test a lot faster by limiting the sequence length used in tests
-
- 24 Jun, 2020 4 commits
-
-
Thomas Wolf authored
* update tests for fast tokenizers + fix small bug in saving/loading * better tests on serialization * fixing serialization * comment cleanup
-
Lysandre Debut authored
* Cleaning TensorFlow models Update all classes stylr * Don't average loss
-
Patrick von Platen authored
* fix use cache * add bart use cache * fix bart * finish bart
-
Patrick von Platen authored
* add benchmark for all kinds of models * improved import * delete bogus files * make style
-
- 23 Jun, 2020 4 commits
-
-
Sam Shleifer authored
-
Thomas Wolf authored
* Add return lengths * make pad a bit more flexible so it can be used as collate_fn * check all kwargs sent to encoding method are known * fixing kwargs in encodings * New AddedToken class in python This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens. * style and quality * switched to hugginface tokenizers library for AddedTokens * up to tokenizer 0.8.0-rc3 - update API to use AddedToken state * style and quality * do not raise an error on additional or unused kwargs for tokenize() but only a warning * transfo-xl pretrained model requires torch * Update src/transformers/tokenization_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 22 Jun, 2020 4 commits
-
-
Thomas Wolf authored
* fix #5081 and improve backward compatibility (slightly) * add nlp to setup.cfg - style and quality * align default to previous default * remove test that doesn't generalize
-
Joseph Liu authored
* Configure all models to use output_hidden_states as argument passed to foward() * Pass all tests * Remove cast_bool_to_primitive in TF Flaubert model * correct tf xlnet * add pytorch test * add tf test * Fix broken tests * Configure all models to use output_hidden_states as argument passed to foward() * Pass all tests * Remove cast_bool_to_primitive in TF Flaubert model * correct tf xlnet * add pytorch test * add tf test * Fix broken tests * Refactor output_hidden_states for mobilebert * Reset and remerge to master Co-authored-by:
Joseph Liu <joseph.liu@coinflex.com> Co-authored-by:
patrickvonplaten <patrick.v.platen@gmail.com>
-
RafaelWO authored
* Fixed resize_token_embeddings for transfo_xl model * Fixed resize_token_embeddings for transfo_xl. Added custom methods to TransfoXLPreTrainedModel for resizing layers of the AdaptiveEmbedding. * Updated docstring * Fixed resizinhg cutoffs; added check for new size of embedding layer. * Added test for resize_token_embeddings * Fixed code quality * Fixed unchanged cutoffs in model.config * Added feature to move added tokens in tokenizer. * Fixed code quality * Added feature to move added tokens in tokenizer. * Fixed code quality * Fixed docstring, renamed sym to oken. Co-authored-by:Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
-
Patrick von Platen authored
* finish benchmark * fix isort * fix setup cfg * retab * fix time measuring of tf graph mode * fix tf cuda * clean code * better error message
-
- 19 Jun, 2020 2 commits
-
-
Vasily Shamporov authored
* Add MobileBert * Quality + Conversion script * style * Update src/transformers/modeling_mobilebert.py * Links to S3 * Style * TFMobileBert Slight fixes to the pytorch MobileBert Style * MobileBertForMaskedLM (PT + TF) * MobileBertForNextSentencePrediction (PT + TF) * MobileFor{MultipleChoice, TokenClassification} (PT + TF) ss * Tests + Auto * Doc * Tests * Addressing @sgugger's comments * Adressing @patrickvonplaten's comments * Style * Style * Integration test * style * Model card Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Sam Shleifer authored
-
- 18 Jun, 2020 3 commits
-
-
Sylvain Gugger authored
-
Deniz authored
* resize token embeddings * add tokens * add tokens * add tokens * add t5 token method * add t5 token method * add t5 token method * typo * debugging input * debugging input * debug * debug * debug * trying to set embedding tokens properly * set embeddings for generation head too * set embeddings for generation head too * debugging * debugging * enable generation * add base method * add base method * add base method * return logits in the main call * reverting to generation * revert back * set embeddings for the bert main layer * description * fix conflicts * logging * set base model as self * refactor * tf_bert add method * tf_bert add method * tf_bert add method * tf_bert add method * tf_bert add method * tf_bert add method * tf_bert add method * tf_bert add method * v0 * v0 * finalize * final * black * add tests * revert back the emb call * comments * comments * add the second test * add vocab size condig * add tf models * add tf models. add common tests * remove model specific embedding tests * stylish * remove files * stylez * Update src/transformers/modeling_tf_transfo_xl.py change the error. Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * adding unchanged weight test Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Suraj Patil authored
* add ElectraForMultipleChoice * add test_for_multiple_choice * add ElectraForMultipleChoice in auto model * add ElectraForMultipleChoice in all_model_classes * add SequenceSummary related parameters * get rid pooler, use SequenceSummary instead * add electra multiple choice test Co-authored-by:Lysandre Debut <lysandre@huggingface.co>
-
- 17 Jun, 2020 1 commit
-
-
Sylvain Gugger authored
* Make default_data_collator more flexible * Accept tensors for all features * Document code * Refactor * Formatting
-