1. 23 Oct, 2020 1 commit
  2. 22 Oct, 2020 7 commits
  3. 21 Oct, 2020 5 commits
  4. 20 Oct, 2020 1 commit
  5. 19 Oct, 2020 7 commits
  6. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  7. 16 Oct, 2020 4 commits
    • Stas Bekman's avatar
      fix/hide warnings (#7837) · d8ca57d2
      Stas Bekman authored
      s
      d8ca57d2
    • Sam Shleifer's avatar
      [cleanup] assign todos, faster bart-cnn test (#7835) · 96e47d92
      Sam Shleifer authored
      * 2 beam output
      
      * unassign/remove TODOs
      
      * remove one more
      96e47d92
    • rmroczkowski's avatar
      Herbert polish model (#7798) · 7b13bd01
      rmroczkowski authored
      
      
      * HerBERT transformer model for Polish language understanding.
      
      * HerbertTokenizerFast generated with HerbertConverter
      
      * Herbert base and large model cards
      
      * Herbert model cards with tags
      
      * Herbert tensorflow models
      
      * Herbert model tests based on Bert test suit
      
      * src/transformers/tokenization_herbert.py edited online with Bitbucket
      
      * src/transformers/tokenization_herbert.py edited online with Bitbucket
      
      * docs/source/model_doc/herbert.rst edited online with Bitbucket
      
      * Herbert tokenizer tests and bug fixes
      
      * src/transformers/configuration_herbert.py edited online with Bitbucket
      
      * Copyrights and tests for TFHerbertModel
      
      * model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket
      
      * model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket
      
      * Bug fixes after testing
      
      * Reformat modified_only_fixup
      
      * Proper order of configuration
      
      * Herbert proper documentation formatting
      
      * Formatting with make modified_only_fixup
      
      * Dummies fixed
      
      * Adding missing models to documentation
      
      * Removing HerBERT model as it is a simple extension of BERT
      
      * Update model_cards/allegro/herbert-base-cased/README.md
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      
      * Update model_cards/allegro/herbert-large-cased/README.md
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      
      * HerbertTokenizer deprecated configuration removed
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      7b13bd01
    • Lysandre Debut's avatar
      Fix DeBERTa integration tests (#7729) · 52c9e842
      Lysandre Debut authored
      52c9e842
  8. 15 Oct, 2020 1 commit
  9. 14 Oct, 2020 2 commits
  10. 13 Oct, 2020 4 commits
  11. 10 Oct, 2020 1 commit
  12. 09 Oct, 2020 2 commits
  13. 08 Oct, 2020 2 commits
    • Thomas Wolf's avatar
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58
      Thomas Wolf authored
      
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
      
      * [WIP] SP tokenizers
      
      * fixing tests for T5
      
      * WIP tokenizers
      
      * serialization
      
      * update T5
      
      * WIP T5 tokenization
      
      * slow to fast conversion script
      
      * Refactoring to move tokenzier implementations inside transformers
      
      * Adding gpt - refactoring - quality
      
      * WIP adding several tokenizers to the fast world
      
      * WIP Roberta - moving implementations
      
      * update to dev4 switch file loading to in-memory loading
      
      * Updating and fixing
      
      * advancing on the tokenizers - updating do_lower_case
      
      * style and quality
      
      * moving forward with tokenizers conversion and tests
      
      * MBart, T5
      
      * dumping the fast version of transformer XL
      
      * Adding to autotokenizers + style/quality
      
      * update init and space_between_special_tokens
      
      * style and quality
      
      * bump up tokenizers version
      
      * add protobuf
      
      * fix pickle Bert JP with Mecab
      
      * fix newly added tokenizers
      
      * style and quality
      
      * fix bert japanese
      
      * fix funnel
      
      * limite tokenizer warning to one occurence
      
      * clean up file
      
      * fix new tokenizers
      
      * fast tokenizers deep tests
      
      * WIP adding all the special fast tests on the new fast tokenizers
      
      * quick fix
      
      * adding more fast tokenizers in the fast tests
      
      * all tokenizers in fast version tested
      
      * Adding BertGenerationFast
      
      * bump up setup.py for CI
      
      * remove BertGenerationFast (too early)
      
      * bump up tokenizers version
      
      * Clean old docstrings
      
      * Typo
      
      * Update following Lysandre comments
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      9aeacb58
    • Sam Shleifer's avatar
      e3e65173
  14. 07 Oct, 2020 2 commits