1. 17 Nov, 2020 1 commit
    • Sylvain Gugger's avatar
      Reorganize repo (#8580) · c89bdfbe
      Sylvain Gugger authored
      * Put models in subfolders
      
      * Styling
      
      * Fix imports in tests
      
      * More fixes in test imports
      
      * Sneaky hidden imports
      
      * Fix imports in doc files
      
      * More sneaky imports
      
      * Finish fixing tests
      
      * Fix examples
      
      * Fix path for copies
      
      * More fixes for examples
      
      * Fix dummy files
      
      * More fixes for example
      
      * More model import fixes
      
      * Is this why you're unhappy GitHub?
      
      * Fix imports in conver command
      c89bdfbe
  2. 18 Sep, 2020 1 commit
    • Dat Quoc Nguyen's avatar
      Add new pre-trained models BERTweet and PhoBERT (#6129) · af2322c7
      Dat Quoc Nguyen authored
      * Add BERTweet and PhoBERT models
      
      * Update modeling_auto.py
      
      Re-add `bart` to LM_MAPPING
      
      * Update tokenization_auto.py
      
      Re-add `from .configuration_mobilebert import MobileBertConfig`
      not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`
      
      * Add BERTweet and PhoBERT to pretrained_models.rst
      
      * Update tokenization_auto.py
      
      Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.
      
      * Update BertweetTokenizer - without nltk
      
      * Update model card for BERTweet
      
      * PhoBERT - with Auto mode - without import fastBPE
      
      * PhoBERT - with Auto mode - without import fastBPE
      
      * BERTweet - with Auto mode - without import fastBPE
      
      * Add PhoBERT and BERTweet to TF modeling auto
      
      * Improve Docstrings for PhobertTokenizer and BertweetTokenizer
      
      * Update PhoBERT and BERTweet model cards
      
      * Fixed a merge conflict in tokenization_auto
      
      * Used black to reformat BERTweet- and PhoBERT-related files
      
      * Used isort to reformat BERTweet- and PhoBERT-related files
      
      * Reformatted BERTweet- and PhoBERT-related files based on flake8
      
      * Updated test files
      
      * Updated test files
      
      * Updated tf test files
      
      * Updated tf test files
      
      * Updated tf test files
      
      * Updated tf test files
      
      * Update commits from huggingface
      
      * Delete unnecessary files
      
      * Add tokenizers to auto and init files
      
      * Add test files for tokenizers
      
      * Revised model cards
      
      * Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files
      
      * Revised test files
      
      * Update orders of Phobert and Bertweet tokenizers in auto tokenization file
      af2322c7
  3. 15 Jun, 2020 1 commit
    • Anthony MOI's avatar
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220
      Anthony MOI authored
      
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
      
      * Use tokenizers pre-tokenized pipeline
      
      * failing pretrokenized test
      
      * Fix is_pretokenized in python
      
      * add pretokenized tests
      
      * style and quality
      
      * better tests for batched pretokenized inputs
      
      * tokenizers clean up - new padding_strategy - split the files
      
      * [HUGE] refactoring tokenizers - padding - truncation - tests
      
      * style and quality
      
      * bump up requied tokenizers version to 0.8.0-rc1
      
      * switched padding/truncation API - simpler better backward compat
      
      * updating tests for custom tokenizers
      
      * style and quality - tests on pad
      
      * fix QA pipeline
      
      * fix backward compatibility for max_length only
      
      * style and quality
      
      * Various cleans up - add verbose
      
      * fix tests
      
      * update docstrings
      
      * Fix tests
      
      * Docs reformatted
      
      * __call__ method documented
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      36434220
  4. 15 Jan, 2020 1 commit
  5. 06 Jan, 2020 2 commits
  6. 22 Dec, 2019 8 commits
  7. 21 Dec, 2019 1 commit
    • Aymeric Augustin's avatar
      Reformat source code with black. · fa84ae26
      Aymeric Augustin authored
      This is the result of:
      
          $ black --line-length 119 examples templates transformers utils hubconf.py setup.py
      
      There's a lot of fairly long lines in the project. As a consequence, I'm
      picking the longest widely accepted line length, 119 characters.
      
      This is also Thomas' preference, because it allows for explicit variable
      names, to make the code easier to understand.
      fa84ae26
  8. 08 Oct, 2019 1 commit
  9. 04 Oct, 2019 1 commit
    • keskarnitish's avatar
      Adding CTRL (squashed commit) · dbed1c5d
      keskarnitish authored
      adding conversion script
      
      adding first draft of modeling & tokenization
      
      adding placeholder for test files
      
      bunch of changes
      
      registering the tokenizer/model/etc
      
      tests
      
      change link; something is very VERY wrong here
      
      weird end-of-word thingy going on
      
      i think the tokenization works now ; wrote the unit tests
      
      overall structure works;load w next
      
      the monster is alive!
      
      works after some cleanup as well
      
      adding emacs autosave to gitignore
      
      currently only supporting the 48 layer one; seems to infer fine on my macbook
      
      cleanup
      
      fixing some documentation
      
      fixing some documentation
      
      tests passing?
      
      now works on CUDA also
      
      adding greedy?
      
      adding greedy sampling
      
      works well
      dbed1c5d
  10. 26 Sep, 2019 2 commits
  11. 30 Aug, 2019 5 commits
  12. 05 Aug, 2019 1 commit
  13. 15 Jul, 2019 1 commit
  14. 09 Jul, 2019 2 commits
  15. 05 Jul, 2019 3 commits
  16. 02 Jul, 2019 1 commit
  17. 17 Apr, 2019 4 commits
  18. 16 Apr, 2019 1 commit
  19. 15 Apr, 2019 2 commits
  20. 11 Feb, 2019 1 commit