- 09 Oct, 2020 3 commits
-
-
Stas Bekman authored
-
Funtowicz Morgan authored
* Reintroduce clean_text call which was removed by mistake in #4723 Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Added unittest for clean_text parameter on Bert tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Better unittest name. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Adapt unittest to use untrained tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Code quality + update test Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
guhur authored
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
-
- 08 Oct, 2020 3 commits
-
-
Lysandre Debut authored
* Fix RobertaForCausalLM docs * Apply review suggestion Co-authored-by:
sgugger <sylvain.gugger@gmail,com> Co-authored-by:
sgugger <sylvain.gugger@gmail,com>
-
Thomas Wolf authored
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) * [WIP] SP tokenizers * fixing tests for T5 * WIP tokenizers * serialization * update T5 * WIP T5 tokenization * slow to fast conversion script * Refactoring to move tokenzier implementations inside transformers * Adding gpt - refactoring - quality * WIP adding several tokenizers to the fast world * WIP Roberta - moving implementations * update to dev4 switch file loading to in-memory loading * Updating and fixing * advancing on the tokenizers - updating do_lower_case * style and quality * moving forward with tokenizers conversion and tests * MBart, T5 * dumping the fast version of transformer XL * Adding to autotokenizers + style/quality * update init and space_between_special_tokens * style and quality * bump up tokenizers version * add protobuf * fix pickle Bert JP with Mecab * fix newly added tokenizers * style and quality * fix bert japanese * fix funnel * limite tokenizer warning to one occurence * clean up file * fix new tokenizers * fast tokenizers deep tests * WIP adding all the special fast tests on the new fast tokenizers * quick fix * adding more fast tokenizers in the fast tests * all tokenizers in fast version tested * Adding BertGenerationFast * bump up setup.py for CI * remove BertGenerationFast (too early) * bump up tokenizers version * Clean old docstrings * Typo * Update following Lysandre comments Co-authored-by:Sylvain Gugger <sylvain.gugger@gmail.com>
-
Piero Molino authored
Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935) * Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load * Replaced torch.save with pickle.dump when saving the vocabulary * updating transformer-xl * uploaded on S3 - compatibility * fix tests * style * Address review comments Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
- 07 Oct, 2020 3 commits
-
-
Sam Shleifer authored
Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
* Initial callback proposal * Finish various callbacks * Post-rebase conflicts * Fix tests * Don't use something that's not set * Documentation * Remove unwanted print. * Document all models can work * Add tests + small fixes * Update docs/source/internal/trainer_utils.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address review comments * Fix TF tests * Real fix this time * This one should work * Fix typo * Really fix typo Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Lysandre Debut authored
-
- 06 Oct, 2020 8 commits
-
-
Gabriele Picco authored
* Fix UnboundLocalError when PaddingStrategy is MAX_LENGTH * Fix UnboundLocalError for TruncationStrategy
-
Philipp authored
Resolves: #7613
-
Lysandre authored
-
Lysandre Debut authored
* Add GPT2ForSequenceClassification based on DialogRPT * Better documentation * Code quality
-
Sam Shleifer authored
-
George Mihaila authored
-
Siddharth Jain authored
* Fixing top_k and min_length assertions, and a typo fix * Apply suggestions from code review Co-authored-by:Patrick von Platen <patrick.v.platen@gmail.com>
-
Lysandre Debut authored
* Configuration * Modeling * Tokenization * Obliterate the trailing spaces * From underlines to long underlines
-
- 05 Oct, 2020 12 commits
-
-
Sylvain Gugger authored
-
Julien Plu authored
* First try * Fix TF utils * Handle authorized unexpected keys when loading weights * Add several more authorized unexpected keys * Apply style * Fix test * Address Patrick's comments. * Update src/transformers/modeling_tf_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply style * Make return_dict the default behavior and display a warning message * Revert * Replace wrong keyword * Revert code * Add forgot key * Fix bug in loading PT models from a TF one. * Fix sort * Add a test for custom load weights in BERT * Apply style * Remove unused import Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
* PoC on RAG * Format class name/obj name * Better name in message * PoC on one TF model * Add PyTorch and TF dummy objects + script * Treat scikit-learn * Bad copy pastes * Typo
-
Malte Pietsch authored
* fix squad tokenization for roberta & co * change to pure type based check * sort imports
-
Sylvain Gugger authored
-
Cola authored
*
馃毄 Add `power` argument for TF PolynomialDecay *馃毄 Create default optimizer with power *馃毄 Add argument to training args *馃毃 Clean code format *馃毃 Fix black warning *馃毃 Fix code format -
Lysandre Debut authored
-
Forrest Iandola authored
* configuration_squeezebert.py thin wrapper around bert tokenizer fix typos wip sb model code wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working set up squeezebert to use BertModelOutput when returning results. squeezebert documentation formatting allow head mask that is an array of [None, ..., None] docs docs cont'd path to vocab docs and pointers to cloud files (WIP) line length and indentation squeezebert model cards formatting of model cards untrack modeling_squeezebert_scratchpad.py update aws paths to vocab and config files get rid of stub of NSP code, and advise users to pretrain with mlm only fix rebase issues redo rebase of modeling_auto.py fix issues with code formatting more code format auto-fixes move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert tests for squeezebert modeling and tokenization fix typo move squeezebert before bert in modeling_auto.py to fix inheritance problem disable test_head_masking, since squeezebert doesn't yet implement head masking fix issues exposed by the test_modeling_squeezebert.py fix an issue exposed by test_tokenization_squeezebert.py fix issue exposed by test_modeling_squeezebert.py auto generated code style improvement issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head() update copyright resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask docs add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli autogenerated formatting tweaks integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings * tiny change to order of imports
-
Sylvain Gugger authored
* Cleanup documentation for BART, Marian, MBART and Pegasus * Cleanup documentation for BART, Marian, MBART and Pegasus
-
Alexandr authored
* LayoutLM: add exception handling for bbox values To replicate unhandled error: - In `test_modelling_layoutlm.py` set `range_bbox=1025`, i.e. greater 1024 - Run `pytest tests/test_modeling_layoutlm.py` Requirement for bbox values to be within the range 0-1000 is documented but if it is violated then it isa not clear what is the issue from error message. * Update src/transformers/modeling_layoutlm.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 04 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
-
- 01 Oct, 2020 9 commits
-
-
Sylvain Gugger authored
* Fix seq2seq example test * Fix bad copy-paste * Also save the state
-
Sylvain Gugger authored
* Trainer should not modify its TrainingArguments * Trainer should not modify its TrainingArguments * Trainer should not modify its TrainingArguments * Add test of resumed training * Fixes * Non multiGPU test * Clean Trainer state * Add more to the state * Documentation * One last test * Make resume training test more complete * Unwanted changes
-
Patrick von Platen authored
-
Patrick von Platen authored
* clean T5 * fix t5 tests * fix index typo * fix tf common test * fix examples * change positional ordering for Bart and FSTM * add signature test * clean docs and add tests * add docs to encoder decoder * clean docs * correct two doc strings * remove sig test for TF Elektra & Funnel * fix tf t5 slow tests * fix input_ids to inputs in tf * Update src/transformers/modeling_bart.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_bart.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * implement lysandre results * make style * fix encoder decoder typo * fix tf slow tests * fix slow tests * renaming * remove unused input Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Kai Fricke authored
-
Kai Fricke authored
-
Lysandre Debut authored
-
Sam Shleifer authored
* Clean clamp * boom boom * Take some other changes * boom boom * boom boom * boom boom * one chg * fix test * Use finfo * style
-
Sam Shleifer authored
* reset model.config * Update src/transformers/trainer.py * use lower case tensor * Just tensor change
-
- 30 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* Small QOL improvements to TrainingArguments * With the self.
-