- 09 Oct, 2020 8 commits
-
-
sgugger authored
-
Julien Plu authored
* Fix test * Fix cardinality issue * Fix test
-
Joe Davison authored
-
Funtowicz Morgan authored
* Reintroduce clean_text call which was removed by mistake in #4723 Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Added unittest for clean_text parameter on Bert tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Better unittest name. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Adapt unittest to use untrained tokenizer. Signed-off-by:
Morgan Funtowicz <funtowiczmo@gmail.com> * Code quality + update test Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Noah Trenaman authored
-
guhur authored
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
-
Sam Shleifer authored
- 08 Oct, 2020 8 commits
-
-
Sam Shleifer authored
-
Suraj Patil authored
-
Lysandre Debut authored
* Fix RobertaForCausalLM docs * Apply review suggestion Co-authored-by:
sgugger <sylvain.gugger@gmail,com> Co-authored-by:
sgugger <sylvain.gugger@gmail,com>
-
Thomas Wolf authored
* pin torch-hub test * add protobuf dep
-
Thomas Wolf authored
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141) * [WIP] SP tokenizers * fixing tests for T5 * WIP tokenizers * serialization * update T5 * WIP T5 tokenization * slow to fast conversion script * Refactoring to move tokenzier implementations inside transformers * Adding gpt - refactoring - quality * WIP adding several tokenizers to the fast world * WIP Roberta - moving implementations * update to dev4 switch file loading to in-memory loading * Updating and fixing * advancing on the tokenizers - updating do_lower_case * style and quality * moving forward with tokenizers conversion and tests * MBart, T5 * dumping the fast version of transformer XL * Adding to autotokenizers + style/quality * update init and space_between_special_tokens * style and quality * bump up tokenizers version * add protobuf * fix pickle Bert JP with Mecab * fix newly added tokenizers * style and quality * fix bert japanese * fix funnel * limite tokenizer warning to one occurence * clean up file * fix new tokenizers * fast tokenizers deep tests * WIP adding all the special fast tests on the new fast tokenizers * quick fix * adding more fast tokenizers in the fast tests * all tokenizers in fast version tested * Adding BertGenerationFast * bump up setup.py for CI * remove BertGenerationFast (too early) * bump up tokenizers version * Clean old docstrings * Typo * Update following Lysandre comments Co-authored-by:Sylvain Gugger <sylvain.gugger@gmail.com>
-
Piero Molino authored
Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935) * Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load * Replaced torch.save with pickle.dump when saving the vocabulary * updating transformer-xl * uploaded on S3 - compatibility * fix tests * style * Address review comments Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 07 Oct, 2020 13 commits
-
-
Sam Shleifer authored
Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Blaise Cruz authored
-
Bobby Donchev authored
* Create README.md * Update README.md * Apply suggestions from code review Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Keshan authored
* [Model card] SinhalaBERTo model. This is the model card for keshan/SinhalaBERTo model. * Update model_cards/keshan/SinhalaBERTo/README.md Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Amine Abdaoui authored
Co-authored-by:Amin <amin.geotrend@gmail.com>
-
Abed khooli authored
-
dartrevan authored
-
Ilias Chalkidis authored
Minor changes: Add arxiv link + Layout improvement + fix typos
-
Abhilash Majumder authored
-
Julien Chaumond authored
by @nikkon3
-
Sam Shleifer authored
-
Sylvain Gugger authored
* Initial callback proposal * Finish various callbacks * Post-rebase conflicts * Fix tests * Don't use something that's not set * Documentation * Remove unwanted print. * Document all models can work * Add tests + small fixes * Update docs/source/internal/trainer_utils.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Address review comments * Fix TF tests * Real fix this time * This one should work * Fix typo * Really fix typo Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Lysandre Debut authored
-
- 06 Oct, 2020 11 commits
-
-
Gabriele Picco authored
* Fix UnboundLocalError when PaddingStrategy is MAX_LENGTH * Fix UnboundLocalError for TruncationStrategy
-
Philipp authored
Resolves: #7613
-
Lysandre authored
-
Lysandre Debut authored
* Add GPT2ForSequenceClassification based on DialogRPT * Better documentation * Code quality
-
Sam Shleifer authored
-
Sam Shleifer authored
-
Ahmed Elnaggar authored
It should be T5-3B not T5-3M.
-
Adrien David-Sivelle authored
- Use cuda:10.2 image instead of 10.1 (to address version mismatch warning with pytorch) - Use devel version that is built on the runtime and includes headers and development tools (was otherwise failing to build apex)
-
George Mihaila authored
-
cedspam authored
-
Ilias Chalkidis authored
* Create README.md Model description for all LEGAL-BERT models, published as part of "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020 * Update model_cards/nlpaueb/legal-bert-base-uncased/README.md Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-