- 01 Jul, 2020 1 commit
-
-
Sam Shleifer authored
-
- 15 Jun, 2020 1 commit
-
-
Anthony MOI authored
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
- 02 Jun, 2020 1 commit
-
-
Julien Chaumond authored
* Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI
-
- 20 May, 2020 1 commit
-
-
Julien Chaumond authored
-
- 19 May, 2020 1 commit
-
-
Sam Shleifer authored
-
- 03 Apr, 2020 1 commit
-
-
Yohei Tamura authored
* BertJapaneseTokenizer accept options for mecab * black * fix mecab_option to Option[str]
-
- 15 Jan, 2020 1 commit
-
-
Julien Chaumond authored
-
- 06 Jan, 2020 2 commits
-
-
alberduris authored
-
alberduris authored
-
- 22 Dec, 2019 8 commits
-
-
Aymeric Augustin authored
On Python 3, `open is io.open`.
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This is the same change as for (TF)CommonTestCases for modeling.
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This change is mostly autogenerated with: $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py I made minor changes in the generated diff. -
Aymeric Augustin authored
This is the result of: $ isort --recursive examples templates transformers utils hubconf.py setup.py
-
- 21 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
This is the result of: $ black --line-length 119 examples templates transformers utils hubconf.py setup.py There's a lot of fairly long lines in the project. As a consequence, I'm picking the longest widely accepted line length, 119 characters. This is also Thomas' preference, because it allows for explicit variable names, to make the code easier to understand.
-
- 11 Dec, 2019 2 commits
-
-
Julien Chaumond authored
-
Masatoshi Suzuki authored
-