1. 22 Oct, 2020 1 commit
  2. 21 Oct, 2020 1 commit
  3. 16 Oct, 2020 1 commit
  4. 04 Oct, 2020 1 commit
  5. 01 Oct, 2020 2 commits
  6. 30 Sep, 2020 1 commit
  7. 27 Sep, 2020 1 commit
  8. 24 Sep, 2020 1 commit
  9. 21 Sep, 2020 1 commit
  10. 17 Sep, 2020 1 commit
  11. 16 Sep, 2020 2 commits
  12. 14 Sep, 2020 2 commits
  13. 13 Sep, 2020 1 commit
  14. 10 Sep, 2020 1 commit
  15. 04 Sep, 2020 1 commit
  16. 28 Aug, 2020 1 commit
    • Sam Shleifer's avatar
      prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a
      Sam Shleifer authored
      * broken test
      
      * batch parity
      
      * tests pass
      
      * boom boom
      
      * boom boom
      
      * split out bart tokenizer tests
      
      * fix tests
      
      * boom boom
      
      * Fixed dataset bug
      
      * Fix marian
      
      * Undo extra
      
      * Get marian working
      
      * Fix t5 tok tests
      
      * Test passing
      
      * Cleanup
      
      * better assert msg
      
      * require torch
      
      * Fix mbart tests
      
      * undo extra decoder_attn_mask change
      
      * Fix import
      
      * pegasus tokenizer can ignore src_lang kwargs
      
      * unused kwarg test cov
      
      * boom boom
      
      * add todo for pegasus issue
      
      * cover one word translation edge case
      
      * Cleanup
      
      * doc
      9336086a
  17. 26 Aug, 2020 1 commit
  18. 25 Aug, 2020 1 commit
  19. 13 Aug, 2020 1 commit
  20. 11 Aug, 2020 1 commit
  21. 08 Aug, 2020 1 commit
  22. 06 Aug, 2020 1 commit
  23. 28 Jul, 2020 2 commits
  24. 21 Jul, 2020 1 commit
  25. 18 Jul, 2020 1 commit
  26. 17 Jul, 2020 1 commit
  27. 15 Jul, 2020 2 commits
  28. 07 Jul, 2020 1 commit
  29. 26 Jun, 2020 2 commits
  30. 25 Jun, 2020 1 commit
  31. 23 Jun, 2020 1 commit
  32. 19 Jun, 2020 1 commit
  33. 17 Jun, 2020 1 commit
  34. 15 Jun, 2020 1 commit
    • Anthony MOI's avatar
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220
      Anthony MOI authored
      
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
      
      * Use tokenizers pre-tokenized pipeline
      
      * failing pretrokenized test
      
      * Fix is_pretokenized in python
      
      * add pretokenized tests
      
      * style and quality
      
      * better tests for batched pretokenized inputs
      
      * tokenizers clean up - new padding_strategy - split the files
      
      * [HUGE] refactoring tokenizers - padding - truncation - tests
      
      * style and quality
      
      * bump up requied tokenizers version to 0.8.0-rc1
      
      * switched padding/truncation API - simpler better backward compat
      
      * updating tests for custom tokenizers
      
      * style and quality - tests on pad
      
      * fix QA pipeline
      
      * fix backward compatibility for max_length only
      
      * style and quality
      
      * Various cleans up - add verbose
      
      * fix tests
      
      * update docstrings
      
      * Fix tests
      
      * Docs reformatted
      
      * __call__ method documented
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      36434220