1. 08 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58
      Thomas Wolf authored
      
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
      
      * [WIP] SP tokenizers
      
      * fixing tests for T5
      
      * WIP tokenizers
      
      * serialization
      
      * update T5
      
      * WIP T5 tokenization
      
      * slow to fast conversion script
      
      * Refactoring to move tokenzier implementations inside transformers
      
      * Adding gpt - refactoring - quality
      
      * WIP adding several tokenizers to the fast world
      
      * WIP Roberta - moving implementations
      
      * update to dev4 switch file loading to in-memory loading
      
      * Updating and fixing
      
      * advancing on the tokenizers - updating do_lower_case
      
      * style and quality
      
      * moving forward with tokenizers conversion and tests
      
      * MBart, T5
      
      * dumping the fast version of transformer XL
      
      * Adding to autotokenizers + style/quality
      
      * update init and space_between_special_tokens
      
      * style and quality
      
      * bump up tokenizers version
      
      * add protobuf
      
      * fix pickle Bert JP with Mecab
      
      * fix newly added tokenizers
      
      * style and quality
      
      * fix bert japanese
      
      * fix funnel
      
      * limite tokenizer warning to one occurence
      
      * clean up file
      
      * fix new tokenizers
      
      * fast tokenizers deep tests
      
      * WIP adding all the special fast tests on the new fast tokenizers
      
      * quick fix
      
      * adding more fast tokenizers in the fast tests
      
      * all tokenizers in fast version tested
      
      * Adding BertGenerationFast
      
      * bump up setup.py for CI
      
      * remove BertGenerationFast (too early)
      
      * bump up tokenizers version
      
      * Clean old docstrings
      
      * Typo
      
      * Update following Lysandre comments
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      9aeacb58
  2. 11 Sep, 2020 1 commit
  3. 10 Sep, 2020 1 commit
    • Patrick von Platen's avatar
      Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594) · 7fd1febf
      Patrick von Platen authored
      * add conversion script
      
      * improve conversion script
      
      * make style
      
      * add tryout files
      
      * fix
      
      * update
      
      * add causal bert
      
      * better names
      
      * add tokenizer file as well
      
      * finish causal_bert
      
      * fix small bugs
      
      * improve generate
      
      * change naming
      
      * renaming
      
      * renaming
      
      * renaming
      
      * remove leftover files
      
      * clean files
      
      * add fix tokenizer
      
      * finalize
      
      * correct slow test
      
      * update docs
      
      * small fixes
      
      * fix link
      
      * adapt check repo
      
      * apply sams and sylvains recommendations
      
      * fix import
      
      * implement Lysandres recommendations
      
      * fix logger warn
      7fd1febf
  4. 30 Aug, 2020 1 commit
  5. 28 Aug, 2020 2 commits
    • Sam Shleifer's avatar
      3cac867f
    • Sam Shleifer's avatar
      prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a
      Sam Shleifer authored
      * broken test
      
      * batch parity
      
      * tests pass
      
      * boom boom
      
      * boom boom
      
      * split out bart tokenizer tests
      
      * fix tests
      
      * boom boom
      
      * Fixed dataset bug
      
      * Fix marian
      
      * Undo extra
      
      * Get marian working
      
      * Fix t5 tok tests
      
      * Test passing
      
      * Cleanup
      
      * better assert msg
      
      * require torch
      
      * Fix mbart tests
      
      * undo extra decoder_attn_mask change
      
      * Fix import
      
      * pegasus tokenizer can ignore src_lang kwargs
      
      * unused kwarg test cov
      
      * boom boom
      
      * add todo for pegasus issue
      
      * cover one word translation edge case
      
      * Cleanup
      
      * doc
      9336086a
  6. 26 Aug, 2020 1 commit
  7. 25 Aug, 2020 1 commit
  8. 17 Aug, 2020 1 commit
  9. 19 May, 2020 1 commit
  10. 15 Jan, 2020 1 commit
  11. 06 Jan, 2020 2 commits
  12. 22 Dec, 2019 7 commits
  13. 21 Dec, 2019 1 commit
    • Aymeric Augustin's avatar
      Reformat source code with black. · fa84ae26
      Aymeric Augustin authored
      This is the result of:
      
          $ black --line-length 119 examples templates transformers utils hubconf.py setup.py
      
      There's a lot of fairly long lines in the project. As a consequence, I'm
      picking the longest widely accepted line length, 119 characters.
      
      This is also Thomas' preference, because it allows for explicit variable
      names, to make the code easier to understand.
      fa84ae26
  14. 10 Dec, 2019 1 commit
  15. 07 Nov, 2019 1 commit
  16. 06 Nov, 2019 1 commit
  17. 04 Nov, 2019 1 commit
  18. 22 Oct, 2019 1 commit
  19. 04 Oct, 2019 1 commit
  20. 26 Sep, 2019 1 commit
  21. 19 Sep, 2019 1 commit
  22. 30 Aug, 2019 1 commit
  23. 12 Aug, 2019 1 commit
  24. 05 Aug, 2019 1 commit
  25. 15 Jul, 2019 1 commit
  26. 09 Jul, 2019 2 commits
  27. 05 Jul, 2019 3 commits
  28. 02 Jul, 2019 1 commit
  29. 29 Jun, 2019 1 commit