1. 30 Nov, 2020 1 commit
  2. 29 Nov, 2020 1 commit
    • Stas Bekman's avatar
      [CI] implement job skipping for doc-only PRs (#8826) · c239dcda
      Stas Bekman authored
      * implement job skipping for doc-only PRs
      
      * silent grep is crucial
      
      * wip
      
      * wip
      
      * wip
      
      * wip
      
      * wip
      
      * wip
      
      * wip
      
      * wip
      
      * let's add doc
      
      * let's add code
      
      * revert test commits
      
      * restore
      
      * Better name
      
      * Better name
      
      * Better name
      
      * some more testing
      
      * some more testing
      
      * some more testing
      
      * finish testing
      c239dcda
  3. 23 Nov, 2020 1 commit
    • Julien Chaumond's avatar
      Improve bert-japanese tokenizer handling (#8659) · 0cc5ab13
      Julien Chaumond authored
      
      
      * Make ci fail
      
      * Try to make tests actually run?
      
      * CI finally failing?
      
      * Fix CI
      
      * Revert "Fix CI"
      
      This reverts commit ca7923be7334d4e571b023478ebdd6b33dfd0ebb.
      
      * Ooops wrong one
      
      * one more try
      
      * Ok ok let's move this elsewhere
      
      * Alternative to globals() (#8667)
      
      * Alternative to globals()
      
      * Error is raised later so return None
      
      * Sentencepiece not installed make some tokenizers None
      
      * Apply Lysandre wisdom
      
      * Slightly clearer comment?
      
      cc @sgugger
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      0cc5ab13
  4. 19 Nov, 2020 1 commit
  5. 11 Nov, 2020 2 commits
  6. 04 Nov, 2020 2 commits
  7. 03 Nov, 2020 2 commits
  8. 29 Oct, 2020 1 commit
    • Sylvain Gugger's avatar
      Add a template for examples and apply it for mlm and plm examples (#8153) · 69117628
      Sylvain Gugger authored
      * Add a template for example scripts and apply it to mlm
      
      * Formatting
      
      * Fix test
      
      * Add plm script
      
      * Add a template for example scripts and apply it to mlm
      
      * Formatting
      
      * Fix test
      
      * Add plm script
      
      * Add a template for example scripts and apply it to mlm
      
      * Formatting
      
      * Fix test
      
      * Add plm script
      
      * Styling
      69117628
  9. 28 Oct, 2020 1 commit
  10. 27 Oct, 2020 2 commits
  11. 26 Oct, 2020 1 commit
    • Sylvain Gugger's avatar
      Doc styling (#8067) · 08f534d2
      Sylvain Gugger authored
      * Important files
      
      * Styling them all
      
      * Revert "Styling them all"
      
      This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
      
      * Syling them for realsies
      
      * Fix syntax error
      
      * Fix benchmark_utils
      
      * More fixes
      
      * Fix modeling auto and script
      
      * Remove new line
      
      * Fixes
      
      * More fixes
      
      * Fix more files
      
      * Style
      
      * Add FSMT
      
      * More fixes
      
      * More fixes
      
      * More fixes
      
      * More fixes
      
      * Fixes
      
      * More fixes
      
      * More fixes
      
      * Last fixes
      
      * Make sphinx happy
      08f534d2
  12. 23 Oct, 2020 1 commit
  13. 20 Oct, 2020 1 commit
  14. 19 Oct, 2020 2 commits
  15. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  16. 05 Oct, 2020 1 commit
  17. 23 Sep, 2020 1 commit
    • Stas Bekman's avatar
      [code quality] fix confused flake8 (#7309) · df536438
      Stas Bekman authored
      * fix confused flake
      
      We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:
      
      ```
      black --line-length 119 --target-version py38 examples templates tests src utils
      ```
      it indeed reformats 63 files.
      
      The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.
      
      Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.
      
      * adjust the other files that will now rely on black's config file
      df536438
  18. 22 Sep, 2020 1 commit
  19. 17 Sep, 2020 1 commit
    • Stas Bekman's avatar
      remove deprecated flag (#7171) · 79111b77
      Stas Bekman authored
      ```
      /home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
        "W0501: The following deprecated CLI flags were used and ignored: "
      ```
      79111b77
  20. 10 Sep, 2020 1 commit
  21. 01 Sep, 2020 1 commit
  22. 25 Aug, 2020 1 commit
  23. 24 Aug, 2020 1 commit
  24. 17 Aug, 2020 1 commit
  25. 12 Aug, 2020 2 commits
  26. 11 Aug, 2020 2 commits
  27. 10 Aug, 2020 1 commit
  28. 07 Aug, 2020 2 commits
  29. 04 Aug, 2020 2 commits
  30. 31 Jul, 2020 1 commit
    • Paul O'Leary McCann's avatar
      Replace mecab-python3 with fugashi for Japanese tokenization (#6086) · cf3cf304
      Paul O'Leary McCann authored
      
      
      * Replace mecab-python3 with fugashi
      
      This replaces mecab-python3 with fugashi for Japanese tokenization. I am
      the maintainer of both projects.
      
      Both projects are MeCab wrappers, so the underlying C++ code is the
      same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
      use of the MeCab API it's easier to use.
      
      This code insures the use of a version of ipadic installed via pip,
      which should make versioning and tracking down issues easier.
      
      fugashi has wheels for Windows, OSX, and Linux, which will help with
      issues with installing old versions of mecab-python3 on Windows.
      Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
      require a C++ runtime to be installed on Windows.
      
      In adding this change I removed some code dealing with `cursor`,
      `token_start`, and `token_end` variables. These variables didn't seem to
      be used for anything, it is unclear to me why they were there.
      
      I ran the tests and they passed, though I couldn't figure out how to run
      the slow tests (`--runslow` gave an error) and didn't try testing with
      Tensorflow.
      
      * Style fix
      
      * Remove unused variable
      
      Forgot to delete this...
      
      * Adapt doc with install instructions
      
      * Fix typo
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      cf3cf304
  31. 26 Jul, 2020 1 commit
    • Stas Bekman's avatar
      add a summary report flag for run_examples on CI (#6035) · daa5dd12
      Stas Bekman authored
      Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:
      
      ```
      ==================================================================== short test summary info =====================================================================
      PASSED examples/test_examples.py::ExamplesTests::test_generation
      PASSED examples/test_examples.py::ExamplesTests::test_run_glue
      PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
      PASSED examples/test_examples.py::ExamplesTests::test_run_squad
      FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
      ============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
      ```
      which makes it easier to validate whether some example is being covered by CI or not.
      daa5dd12