1. 17 Jun, 2020 1 commit
  2. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  3. 16 Apr, 2020 1 commit
  4. 25 Feb, 2020 1 commit
    • Lysandre Debut's avatar
      Documentation (#2989) · bb7c4685
      Lysandre Debut authored
      * All Tokenizers
      
      BertTokenizer + few fixes
      RobertaTokenizer
      OpenAIGPTTokenizer + Fixes
      GPT2Tokenizer + fixes
      TransfoXLTokenizer
      Correct rst for TransformerXL
      XLMTokenizer + fixes
      XLNet Tokenizer + Style
      DistilBERT + Fix XLNet RST
      CTRLTokenizer
      CamemBERT Tokenizer
      FlaubertTokenizer
      XLMRobertaTokenizer
      cleanup
      
      * cleanup
      bb7c4685
  5. 23 Jan, 2020 3 commits
  6. 06 Jan, 2020 2 commits
  7. 26 Sep, 2019 3 commits
  8. 14 Jul, 2019 1 commit
  9. 10 Jul, 2019 1 commit
  10. 09 Jul, 2019 1 commit
  11. 05 Jul, 2019 2 commits