1. 22 Jul, 2020 1 commit
  2. 28 Jun, 2020 1 commit
  3. 26 Jun, 2020 1 commit
  4. 15 Jun, 2020 1 commit
    • Anthony MOI's avatar
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220
      Anthony MOI authored
      
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
      
      * Use tokenizers pre-tokenized pipeline
      
      * failing pretrokenized test
      
      * Fix is_pretokenized in python
      
      * add pretokenized tests
      
      * style and quality
      
      * better tests for batched pretokenized inputs
      
      * tokenizers clean up - new padding_strategy - split the files
      
      * [HUGE] refactoring tokenizers - padding - truncation - tests
      
      * style and quality
      
      * bump up requied tokenizers version to 0.8.0-rc1
      
      * switched padding/truncation API - simpler better backward compat
      
      * updating tests for custom tokenizers
      
      * style and quality - tests on pad
      
      * fix QA pipeline
      
      * fix backward compatibility for max_length only
      
      * style and quality
      
      * Various cleans up - add verbose
      
      * fix tests
      
      * update docstrings
      
      * Fix tests
      
      * Docs reformatted
      
      * __call__ method documented
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      36434220
  5. 09 Jun, 2020 1 commit
    • Bharat Raghunathan's avatar
      [All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7
      Bharat Raghunathan authored
      
      
      * DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * Fix further regressions in tests relating to `output_attentions`
      
      Ensure proper propagation of `output_attentions` as a function parameter
      to all model subclasses
      
      * Fix more regressions in `test_output_attentions`
      
      * Fix issues with BertEncoder
      
      * Rename related variables to `output_attentions`
      
      * fix pytorch tests
      
      * fix bert and gpt2 tf
      
      * Fix most TF tests for `test_output_attentions`
      
      * Fix linter errors and more TF tests
      
      * fix conflicts
      
      * DOC: Apply Black Formatting
      
      * Fix errors where output_attentions was undefined
      
      * Remove output_attentions in classes per review
      
      * Fix regressions on tests having `output_attention`
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix conflicts
      
      * fix pytorch tests
      
      * fix conflicts
      
      * fix conflicts
      
      * Fix linter errors and more TF tests
      
      * fix tf tests
      
      * make style
      
      * fix isort
      
      * improve output_attentions
      
      * improve tensorflow
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      6e603cb7
  6. 02 Jun, 2020 1 commit
    • Julien Chaumond's avatar
      Kill model archive maps (#4636) · d4c2cb40
      Julien Chaumond authored
      * Kill model archive maps
      
      * Fixup
      
      * Also kill model_archive_map for MaskedBertPreTrainedModel
      
      * Unhook config_archive_map
      
      * Tokenizers: align with model id changes
      
      * make style && make quality
      
      * Fix CI
      d4c2cb40
  7. 29 Apr, 2020 1 commit
    • Julien Chaumond's avatar
      CDN urls (#4030) · 455c6390
      Julien Chaumond authored
      * [file_utils] use_cdn + documentation
      
      * Move to cdn. urls for weights
      
      * [urls] Hotfix for bert-base-japanese
      455c6390
  8. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  9. 16 Apr, 2020 1 commit
  10. 08 Apr, 2020 1 commit
  11. 04 Apr, 2020 1 commit
  12. 24 Mar, 2020 1 commit
  13. 02 Mar, 2020 1 commit
  14. 07 Feb, 2020 1 commit
  15. 29 Jan, 2020 4 commits
  16. 15 Jan, 2020 1 commit
  17. 13 Jan, 2020 1 commit
  18. 07 Jan, 2020 1 commit
  19. 06 Jan, 2020 2 commits
  20. 05 Jan, 2020 1 commit
  21. 28 Dec, 2019 1 commit
  22. 23 Dec, 2019 1 commit
  23. 22 Dec, 2019 14 commits