1. 23 Oct, 2020 1 commit
  2. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  3. 12 Aug, 2020 1 commit
  4. 01 Jul, 2020 1 commit
  5. 16 Jun, 2020 1 commit
    • Funtowicz Morgan's avatar
      Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039) · 9e033649
      Funtowicz Morgan authored
      * Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.
      
      * Added __get_state__() & __set_state__() to be pickable.
      
      * Correct tokens() return type from List[int] to List[str]
      
      * Added unittest for BatchEncoding pickle/unpickle
      
      * Added unittest for BatchEncoding is_fast
      
      * More careful checking on BatchEncoding unpickle tests.
      
      * Formatting.
      
      * is_fast should assertTrue on Rust tokenizers.
      
      * Ensure tensorflow has correct way of checking array_equal
      
      * More formatting.
      9e033649
  6. 04 Jun, 2020 1 commit
    • Funtowicz Morgan's avatar
      Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585) · 5bf9afbf
      Funtowicz Morgan authored
      * Refactor tensor creation in tokenizers.
      
      * Make sure to convert string to TensorType
      
      * Refactor convert_to_tensors_
      
      * Introduce numpy tensor creation
      
      * Format
      
      * Add unittest for TensorType creation from str
      
      * sorting imports
      
      * Added unittests for numpy tensor conversion.
      
      * Do not use in-place version for squeeze as numpy doesn't provide such feature.
      
      * Added extra parameter prepend_batch_axis: bool on prepare_for_model.
      
      * Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.
      
      * style.
      
      * numpy tests require_torch for now while flax not merged.
      
      * Hopefully will make flake8 happy.
      
      * One more time 馃幎
      5bf9afbf
  7. 06 Jan, 2020 2 commits
  8. 22 Dec, 2019 7 commits
  9. 21 Dec, 2019 1 commit
    • Aymeric Augustin's avatar
      Reformat source code with black. · fa84ae26
      Aymeric Augustin authored
      This is the result of:
      
          $ black --line-length 119 examples templates transformers utils hubconf.py setup.py
      
      There's a lot of fairly long lines in the project. As a consequence, I'm
      picking the longest widely accepted line length, 119 characters.
      
      This is also Thomas' preference, because it allows for explicit variable
      names, to make the code easier to understand.
      fa84ae26
  10. 06 Dec, 2019 1 commit
    • Aymeric Augustin's avatar
      Remove dependency on pytest for running tests (#2055) · 35401fe5
      Aymeric Augustin authored
      * Switch to plain unittest for skipping slow tests.
      
      Add a RUN_SLOW environment variable for running them.
      
      * Switch to plain unittest for PyTorch dependency.
      
      * Switch to plain unittest for TensorFlow dependency.
      
      * Avoid leaking open files in the test suite.
      
      This prevents spurious warnings when running tests.
      
      * Fix unicode warning on Python 2 when running tests.
      
      The warning was:
      
          UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
      
      * Support running PyTorch tests on a GPU.
      
      Reverts 27e015bd.
      
      * Tests no longer require pytest.
      
      * Make tests pass on cuda
      35401fe5
  11. 04 Nov, 2019 1 commit
  12. 26 Sep, 2019 1 commit
  13. 09 Jul, 2019 1 commit
  14. 05 Jul, 2019 1 commit