1. 12 May, 2020 2 commits
  2. 11 May, 2020 1 commit
  3. 07 May, 2020 2 commits
  4. 05 May, 2020 1 commit
    • Lysandre Debut's avatar
      Pytorch 1.5.0 (#3973) · 79b1c696
      Lysandre Debut authored
      * Standard deviation can no longer be set to 0
      
      * Remove torch pinned version
      
      * 9th instead of 10th, silly me
      79b1c696
  5. 01 May, 2020 1 commit
  6. 27 Apr, 2020 1 commit
  7. 22 Apr, 2020 1 commit
  8. 21 Apr, 2020 1 commit
  9. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  10. 10 Apr, 2020 1 commit
  11. 06 Apr, 2020 4 commits
    • Funtowicz Morgan's avatar
      Tokenizers v3.0.0 (#3185) · 96ab75b8
      Funtowicz Morgan authored
      
      
      * Renamed num_added_tokens to num_special_tokens_to_add
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Make fast tokenizers unittests work on Windows.
      
      * Entirely refactored unittest for tokenizers fast.
      
      * Remove ABC class for CommonFastTokenizerTest
      
      * Added embeded_special_tokens tests from allenai @dirkgr
      
      * Make embeded_special_tokens tests from allenai more generic
      
      * Uniformize vocab_size as a property for both Fast and normal tokenizers
      
      * Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)
      
      * Ensure providing None input raise the same ValueError than Python tokenizer + tests.
      
      * Fix invalid input for assert_padding when testing batch_encode_plus
      
      * Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.
      
      * Ensure tokenize() correctly forward add_special_tokens to rust.
      
      * Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
      Avoid stripping on None values.
      
      * unittests ensure tokenize() also throws a ValueError if provided None
      
      * Added add_special_tokens unittest for all supported models.
      
      * Style
      
      * Make sure TransfoXL test run only if PyTorch is provided.
      
      * Split up tokenizers tests for each model type.
      
      * Fix invalid unittest with new tokenizers API.
      
      * Filter out Roberta openai detector models from unittests.
      
      * Introduce BatchEncoding on fast tokenizers path.
      
      This new structure exposes all the mappings retrieved from Rust.
      It also keeps the current behavior with model forward.
      
      * Introduce BatchEncoding on slow tokenizers path.
      
      Backward compatibility.
      
      * Improve error message on BatchEncoding for slow path
      
      * Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.
      
      * Style and format.
      
      * Added typing on all methods for PretrainedTokenizerFast
      
      * Style and format
      
      * Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.
      
      * Style and format
      
      * encode_plus now supports pretokenized inputs.
      
      * Remove user warning about add_special_tokens when working on pretokenized inputs.
      
      * Always go through the post processor.
      
      * Added support for pretokenized input pairs on encode_plus
      
      * Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.
      
      * Added pretokenized inputs support on batch_encode_plus
      
      * Update BatchEncoding methods name to match Encoding.
      
      * Bump setup.py tokenizers dependency to 0.7.0rc1
      
      * Remove unused parameters in BertTokenizerFast
      
      * Make sure Roberta returns token_type_ids for unittests.
      
      * Added missing typings
      
      * Update add_tokens prototype to match tokenizers side and allow AddedToken
      
      * Bumping tokenizers to 0.7.0rc2
      
      * Added documentation for BatchEncoding
      
      * Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.
      
      * Added higher-level typing for tokenize / encode_plus / batch_encode_plus.
      
      * Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.
      
      * Fix text-classification pipeline using the wrong tokenizer
      
      * Make pipelines works with BatchEncoding
      
      * Turn off add_special_tokens on tokenize by default.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove add_prefix_space from tokenize call in unittest.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style and quality
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Correct message for batch_encode_plus none input exception.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Fix invalid list comprehension for offset_mapping overriding content every iteration.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * TransfoXL uses Strip normalizer.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Bump tokenizers dependency to 0.7.0rc3
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Support AddedTokens for special_tokens and use left stripping on mask for Roberta.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * SpecilaTokenMixin can use slots to faster access to underlying attributes.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove update_special_tokens from fast tokenizers.
      
      * Ensure TransfoXL unittests are run only when torch is available.
      
      * Style.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style
      
      * Style 馃檹馃檹
      
      
      
      * Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.
      
      * Remove Roberta warning on __init__.
      
      * Move documentation to Google style.
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      96ab75b8
    • LysandreJik's avatar
      Re-pin isort · ea6dba27
      LysandreJik authored
      ea6dba27
    • LysandreJik's avatar
      unpin isort for pypi · 11c3257a
      LysandreJik authored
      11c3257a
    • LysandreJik's avatar
      Release: v2.8.0 · 36bffc81
      LysandreJik authored
      36bffc81
  12. 30 Mar, 2020 3 commits
  13. 26 Mar, 2020 1 commit
  14. 25 Mar, 2020 1 commit
    • Julien Chaumond's avatar
      Experiment w/ dataclasses (including Py36) (#3423) · 83272a38
      Julien Chaumond authored
      * [ci] Also run test_examples in py37
      
      (will revert at the end of the experiment)
      
      * InputExample: use immutable dataclass
      
      * [deps] Install dataclasses for Py<3.7
      
      * [skip ci] Revert "[ci] Also run test_examples in py37"
      
      This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
      83272a38
  15. 24 Mar, 2020 2 commits
  16. 23 Mar, 2020 3 commits
  17. 20 Mar, 2020 1 commit
    • Bram Vanroy's avatar
      Handle pinned version of isort · 115abd21
      Bram Vanroy authored
      The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.
      115abd21
  18. 17 Mar, 2020 1 commit
  19. 04 Mar, 2020 1 commit
  20. 24 Feb, 2020 2 commits
  21. 19 Feb, 2020 3 commits
  22. 10 Feb, 2020 1 commit
  23. 31 Jan, 2020 3 commits
  24. 28 Jan, 2020 1 commit
  25. 17 Jan, 2020 1 commit
    • Brendan Roof's avatar
      Add lower bound to tqdm for tqdm.auto · 23c6998b
      Brendan Roof authored
      - It appears that `tqdm` only introduced `tqdm.auto` in 4.27.
      - See https://github.com/tqdm/tqdm/releases/tag/v4.27.0.
      - Without the lower bound I received the following stack trace in an environment where I already had tqdm installed:
      ```
        File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module>
          from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
        File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module>
          from tqdm.auto import tqdm
      ModuleNotFoundError: No module named 'tqdm.auto'
      ```
      23c6998b