- 26 Oct, 2020 1 commit
-
-
Sylvain Gugger authored
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
-
- 18 Oct, 2020 1 commit
-
-
Thomas Wolf authored
* splitting fast and slow tokenizers [WIP] * [WIP] splitting sentencepiece and tokenizers dependencies * update dummy objects * add name_or_path to models and tokenizers * prefix added to file names * prefix * styling + quality * spliting all the tokenizer files - sorting sentencepiece based ones * update tokenizer version up to 0.9.0 * remove hard dependency on sentencepiece
馃帀 * and removed hard dependency on tokenizers馃帀 * update conversion script * update missing models * fixing tests * move test_tokenization_fast to main tokenization tests - fix bugs * bump up tokenizers * fix bert_generation * update ad fix several tokenizers * keep sentencepiece in deps for now * fix funnel and deberta tests * fix fsmt * fix marian tests * fix layoutlm * fix squeezebert and gpt2 * fix T5 tokenization * fix xlnet tests * style * fix mbart * bump up tokenizers to 0.9.2 * fix model tests * fix tf models * fix seq2seq examples * fix tests without sentencepiece * fix slow => fast conversion without sentencepiece * update auto and bert generation tests * fix mbart tests * fix auto and common test without tokenizers * fix tests without tokenizers * clean up tests lighten up when tokenizers + sentencepiece are both off * style quality and tests fixing * add sentencepiece to doc/examples reqs * leave sentencepiece on for now * style quality split hebert and fix pegasus * WIP Herbert fast * add sample_text_no_unicode and fix hebert tokenization * skip FSMT example test for now * fix style * fix fsmt in example tests * update following Lysandre and Sylvain's comments * Update src/transformers/testing_utils.py Co-authored-by:Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/testing_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 23 Sep, 2020 1 commit
-
-
Sylvain Gugger authored
* Clean up model documentation * Formatting * Preparation work * Long lines * Main work on rst files * Cleanup all config files * Syntax fix * Clean all tokenizers * Work on first models * Models beginning * FaluBERT * All PyTorch models * All models * Long lines again * Fixes * More fixes * Update docs/source/model_doc/bert.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/electra.rst Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Last fixes Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 26 Aug, 2020 2 commits
-
-
Lysandre authored
-
Lysandre Debut authored
* Logging * Style * hf_logging > utils.logging * Address @thomwolf's comments * Update test * Update src/transformers/benchmark/benchmark_utils.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Revert bad change Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 18 Apr, 2020 1 commit
-
-
Thomas Wolf authored
* First pass on utility classes and python tokenizers * finishing cleanup pass * style and quality * Fix tests * Updating following @mfuntowicz comment * style and quality * Fix Roberta * fix batch_size/seq_length inBatchEncoding * add alignement methods + tests * Fix OpenAI and Transfo-XL tokenizers * adding trim_offsets=True default for GPT2 et RoBERTa * style and quality * fix tests * add_prefix_space in roberta * bump up tokenizers to rc7 * style * unfortunately tensorfow does like these - removing shape/seq_len for now * Update src/transformers/tokenization_utils.py Co-Authored-By:
Stefan Schweter <stefan@schweter.it> * Adding doc and docstrings * making flake8 happy Co-authored-by:
Stefan Schweter <stefan@schweter.it>
-
- 25 Feb, 2020 1 commit
-
-
Lysandre Debut authored
* All Tokenizers BertTokenizer + few fixes RobertaTokenizer OpenAIGPTTokenizer + Fixes GPT2Tokenizer + fixes TransfoXLTokenizer Correct rst for TransformerXL XLMTokenizer + fixes XLNet Tokenizer + Style DistilBERT + Fix XLNet RST CTRLTokenizer CamemBERT Tokenizer FlaubertTokenizer XLMRobertaTokenizer cleanup * cleanup
-
- 20 Feb, 2020 1 commit
-
-
Joe Davison authored
-
- 15 Jan, 2020 1 commit
-
-
Julien Chaumond authored
-
- 06 Jan, 2020 2 commits
-
-
alberduris authored
-
alberduris authored
-
- 22 Dec, 2019 7 commits
-
-
Aymeric Augustin authored
On Python 3, `open is io.open`.
-
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This prevents transformers from being importable simply because the CWD is the root of the git repository, while not being importable from other directories. That led to inconsistent behavior, especially in examples. Once you fetch this commit, in your dev environment, you must run: $ pip uninstall transformers $ pip install -e . -
Aymeric Augustin authored
-
Aymeric Augustin authored
-
Aymeric Augustin authored
This is the result of: $ isort --recursive examples templates transformers utils hubconf.py setup.py
-
- 21 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
This is the result of: $ black --line-length 119 examples templates transformers utils hubconf.py setup.py There's a lot of fairly long lines in the project. As a consequence, I'm picking the longest widely accepted line length, 119 characters. This is also Thomas' preference, because it allows for explicit variable names, to make the code easier to understand.
-
- 06 Dec, 2019 1 commit
-
-
Aymeric Augustin authored
* Switch to plain unittest for skipping slow tests. Add a RUN_SLOW environment variable for running them. * Switch to plain unittest for PyTorch dependency. * Switch to plain unittest for TensorFlow dependency. * Avoid leaking open files in the test suite. This prevents spurious warnings when running tests. * Fix unicode warning on Python 2 when running tests. The warning was: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal * Support running PyTorch tests on a GPU. Reverts 27e015bd. * Tests no longer require pytest. * Make tests pass on cuda
-
- 05 Dec, 2019 1 commit
-
-
thomwolf authored
-
- 22 Oct, 2019 2 commits
-
-
Julien Chaumond authored
see also https://github.com/salesforce/ctrl/pull/50
-
Lysandre authored
-
- 10 Oct, 2019 2 commits
- 09 Oct, 2019 1 commit
-
-
LysandreJik authored
-
- 08 Oct, 2019 2 commits
- 04 Oct, 2019 1 commit
-
-
keskarnitish authored
adding conversion script adding first draft of modeling & tokenization adding placeholder for test files bunch of changes registering the tokenizer/model/etc tests change link; something is very VERY wrong here weird end-of-word thingy going on i think the tokenization works now ; wrote the unit tests overall structure works;load w next the monster is alive! works after some cleanup as well adding emacs autosave to gitignore currently only supporting the 48 layer one; seems to infer fine on my macbook cleanup fixing some documentation fixing some documentation tests passing? now works on CUDA also adding greedy? adding greedy sampling works well
-
- 03 Oct, 2019 1 commit
-
-
VictorSanh authored
-
- 26 Sep, 2019 3 commits
-
-
LysandreJik authored
-
thomwolf authored
-
thomwolf authored
-
- 30 Aug, 2019 3 commits
- 23 Aug, 2019 1 commit
-
-
thomwolf authored
-
- 21 Aug, 2019 2 commits
- 04 Aug, 2019 1 commit
-
-
thomwolf authored
-