- 12 Oct, 2020 11 commits
-
-
Sam Shleifer authored
-
Alex Combessie authored
-
Lysandre Debut authored
-
Julien Plu authored
* Fix test * fix generic text classification * fix test * Fix tests
-
sgugger authored
-
Jonathan Chang authored
Fix a bug that happends when subclassing Trainer and overwriting evaluate() without calling prediciton_loop()
-
Kelvin authored
Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory
-
AndreaSottana authored
Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural.
-
fteufel authored
Added is_torch_tpu_available() to the condition for saving a model as xla model. "xla_device" property of config can also be True on a non-xla device, when loading a checkpointthat was trained on xla before. Resolves #7695
-
Sylvain Gugger authored
-
Berowne authored
replace 'men_len' with 'mem_len' to match parameter name
-
- 11 Oct, 2020 3 commits
-
-
Miguel Victor authored
-
Sam Shleifer authored
-
Alexandr Maslov authored
-
- 10 Oct, 2020 2 commits
-
-
Andrew Kane authored
-
Sylvain Gugger authored
-
- 09 Oct, 2020 14 commits
-
-
Sylvain Gugger authored
-
Doug Blank authored
* Import intergration libraries first * isort and black happiness * flake8 happiness * Add a test * Black reformat * Ignore import order in tests * A heavy-handed method of disabling comet for tests * Remove comet_ml tests * Run black on setup.py
-
sgugger authored
-
Sylvain Gugger authored
-
Sam Shleifer authored
-
Stas Bekman authored
-
sgugger authored
-
Julien Plu authored
* Fix test * Fix cardinality issue * Fix test
-
Joe Davison authored
-
Funtowicz Morgan authored
* Reintroduce clean_text call which was removed by mistake in #4723 Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Added unittest for clean_text parameter on Bert tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Better unittest name. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Adapt unittest to use untrained tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Code quality + update test Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Noah Trenaman authored
-
guhur authored
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
-
Sam Shleifer authored
-
- 08 Oct, 2020 8 commits
-
-
Sam Shleifer authored
-
Suraj Patil authored
-
Lysandre Debut authored
* Fix RobertaForCausalLM docs * Apply review suggestion Co-authored-by:
sgugger <sylvain.gugger@gmail,com> Co-authored-by:
sgugger <sylvain.gugger@gmail,com>
-
Thomas Wolf authored
* pin torch-hub test * add protobuf dep
-
Thomas Wolf authored
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) * [WIP] SP tokenizers * fixing tests for T5 * WIP tokenizers * serialization * update T5 * WIP T5 tokenization * slow to fast conversion script * Refactoring to move tokenzier implementations inside transformers * Adding gpt - refactoring - quality * WIP adding several tokenizers to the fast world * WIP Roberta - moving implementations * update to dev4 switch file loading to in-memory loading * Updating and fixing * advancing on the tokenizers - updating do_lower_case * style and quality * moving forward with tokenizers conversion and tests * MBart, T5 * dumping the fast version of transformer XL * Adding to autotokenizers + style/quality * update init and space_between_special_tokens * style and quality * bump up tokenizers version * add protobuf * fix pickle Bert JP with Mecab * fix newly added tokenizers * style and quality * fix bert japanese * fix funnel * limite tokenizer warning to one occurence * clean up file * fix new tokenizers * fast tokenizers deep tests * WIP adding all the special fast tests on the new fast tokenizers * quick fix * adding more fast tokenizers in the fast tests * all tokenizers in fast version tested * Adding BertGenerationFast * bump up setup.py for CI * remove BertGenerationFast (too early) * bump up tokenizers version * Clean old docstrings * Typo * Update following Lysandre comments Co-authored-by:Sylvain Gugger <sylvain.gugger@gmail.com>
-
Piero Molino authored
Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935) * Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load * Replaced torch.save with pickle.dump when saving the vocabulary * updating transformer-xl * uploaded on S3 - compatibility * fix tests * style * Address review comments Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 07 Oct, 2020 2 commits
-
-
Sam Shleifer authored
Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Blaise Cruz authored
-