- 15 Nov, 2020 1 commit
-
-
Thomas Wolf authored
[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073) * Fixing roberta for slow-fast tests * WIP getting equivalence on pipelines * slow-to-fast equivalence - working on question-answering pipeline * optional FAISS tests * Pipeline Q&A * Move pipeline tests to their own test job again * update tokenizer to add sequence id methods * update to tokenizers 0.9.4 * set sentencepiecce as optional * clean up squad * clean up pipelines to use sequence_ids * style/quality * wording * Switch to use_fast = True by default * update tests for use_fast at True by default * fix rag tokenizer test * removing protobuf from required dependencies * fix NER test for use_fast = True by default * fixing example tests (Q&A examples use slow tokenizers for now) * protobuf in main deps extras["sentencepiece"] and example deps * fix protobug install test * try to fix seq2seq by switching to slow tokenizers for now * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 12 Nov, 2020 1 commit
-
-
Julien Plu authored
-
- 26 Aug, 2020 1 commit
-
-
Lysandre authored
-
- 07 Jul, 2020 1 commit
-
-
Suraj Patil authored
* add SquadDataset * add DataCollatorForQuestionAnswering * update __init__ * add run_squad with trainer * add DataCollatorForQuestionAnswering in __init__ * pass data_collator to trainer * doc tweak * Update run_squad_trainer.py * Update __init__.py * Update __init__.py Co-authored-by:
Julien Chaumond <chaumond@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 19 May, 2020 1 commit
-
-
Julien Chaumond authored
* Distributed eval: SequentialDistributedSampler + gather all results * For consistency only write to disk from world_master Close https://github.com/huggingface/transformers/issues/4272 * Working distributed eval * Hook into scripts * Fix #3721 again * TPU.mesh_reduce: stay in tensor space Thanks @jysohn23 * Just a small comment * whitespace * torch.hub: pip install packaging * Add test scenarii
-
- 14 May, 2020 1 commit
-
-
Julien Chaumond authored
see context in https://github.com/huggingface/transformers/pull/4223
-
- 08 May, 2020 1 commit
-
-
Julien Chaumond authored
* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once * Update examples/README.md * [xla_spawn] Add `_mp_fn` to other Trainer scripts * [TPU] Fix: eval dataloader was None
-
- 07 May, 2020 1 commit
-
-
Julien Chaumond authored
* Created using Colaboratory * [examples] reorganize files * remove run_tpu_glue.py as superseded by TPU support in Trainer * Bugfix: int, not tuple * move files around
-
- 24 Apr, 2020 1 commit
-
-
Julien Chaumond authored
Close #3921
-
- 22 Apr, 2020 1 commit
-
-
Julien Chaumond authored
* doc * [tests] Add sample files for a regression task * [HUGE] Trainer * Feedback from @sshleifer * Feedback from @thomwolf + logging tweak * [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes * [glue] Use default max_seq_length of 128 like before * [glue] move DataTrainingArguments around * [ner] Change interface of InputExample, and align run_{tf,pl} * Re-align the pl scripts a little bit * ner * [ner] Add integration test * Fix language_modeling with API tweak * [ci] Tweak loss target * Don't break console output * amp.initialize: model must be on right device before * [multiple-choice] update for Trainer * Re-align to 827d6d6e
-
- 20 Apr, 2020 1 commit
-
-
Andrey Kulagin authored
-
- 06 Apr, 2020 1 commit
-
-
Ethan Perez authored
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py `convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase. * Simplifying change to match recent commits
-
- 01 Apr, 2020 1 commit
-
-
Julien Chaumond authored
* Start cleaning examples * Fixup
-
- 02 Mar, 2020 1 commit
-
-
Victor SANH authored
* fix n_gpu count when no_cuda flag is activated * someone was left behind
-
- 28 Jan, 2020 1 commit
-
-
Lysandre authored
-
- 06 Jan, 2020 2 commits
-
-
alberduris authored
-
alberduris authored
-
- 23 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
-
- 22 Dec, 2019 6 commits
-
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This is the result of: $ isort --recursive examples templates transformers utils hubconf.py setup.py
-
- 21 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
This is the result of: $ black --line-length 119 examples templates transformers utils hubconf.py setup.py There's a lot of fairly long lines in the project. As a consequence, I'm picking the longest widely accepted line length, 119 characters. This is also Thomas' preference, because it allows for explicit variable names, to make the code easier to understand.
-
- 03 Dec, 2019 1 commit
-
-
VictorSanh authored
-
- 14 Nov, 2019 1 commit
-
-
R茅mi Louf authored
-
- 12 Nov, 2019 1 commit
-
-
ronakice authored
-
- 04 Nov, 2019 1 commit
-
-
thomwolf authored
-
- 08 Oct, 2019 1 commit
-
-
Bilal Khan authored
-
- 04 Oct, 2019 1 commit
-
-
Julien Chaumond authored
-
- 03 Oct, 2019 1 commit
-
-
Brian Ma authored
-
- 30 Sep, 2019 1 commit
-
-
Julien Chaumond authored
-
- 26 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 18 Sep, 2019 2 commits