1. 21 Aug, 2020 4 commits
  2. 20 Aug, 2020 2 commits
    • Joe Davison's avatar
      add intro to nlp lib & dataset links to custom datasets tutorial (#6583) · 039d8d65
      Joe Davison authored
      * add intro to nlp lib + links
      
      * unique links...
      039d8d65
    • Romain Rigaux's avatar
      Docs copy button misses ... prefixed code (#6518) · cabfdfaf
      Romain Rigaux authored
      Tested in a local build of the docs.
      
      e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling
      
      Copy will copy the full code, e.g.
      
      for token in top_5_tokens:
           print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
      
      Instead of currently only:
      
      for token in top_5_tokens:
      
      
      >>> for token in top_5_tokens:
      ...     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
      Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
      Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
      Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
      Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
      Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.
      
      Docs for the option fix:
      https://sphinx-copybutton.readthedocs.io/en/latest/
      cabfdfaf
  3. 19 Aug, 2020 1 commit
  4. 18 Aug, 2020 4 commits
  5. 17 Aug, 2020 9 commits
  6. 14 Aug, 2020 3 commits
  7. 12 Aug, 2020 3 commits
  8. 11 Aug, 2020 2 commits
  9. 10 Aug, 2020 4 commits
  10. 07 Aug, 2020 1 commit
  11. 05 Aug, 2020 1 commit
    • Sylvain Gugger's avatar
      Tf model outputs (#6247) · c67d1a02
      Sylvain Gugger authored
      * TF outputs and test on BERT
      
      * Albert to DistilBert
      
      * All remaining TF models except T5
      
      * Documentation
      
      * One file forgotten
      
      * TF outputs and test on BERT
      
      * Albert to DistilBert
      
      * All remaining TF models except T5
      
      * Documentation
      
      * One file forgotten
      
      * Add new models and fix issues
      
      * Quality improvements
      
      * Add T5
      
      * A bit of cleanup
      
      * Fix for slow tests
      
      * Style
      c67d1a02
  12. 04 Aug, 2020 1 commit
  13. 03 Aug, 2020 2 commits
  14. 01 Aug, 2020 1 commit
  15. 31 Jul, 2020 2 commits
    • Sylvain Gugger's avatar
      Harmonize both Trainers API (#6157) · 86caab1e
      Sylvain Gugger authored
      * Harmonize both Trainers API
      
      * Fix test
      
      * main_prcess -> process_zero
      86caab1e
    • Paul O'Leary McCann's avatar
      Replace mecab-python3 with fugashi for Japanese tokenization (#6086) · cf3cf304
      Paul O'Leary McCann authored
      
      
      * Replace mecab-python3 with fugashi
      
      This replaces mecab-python3 with fugashi for Japanese tokenization. I am
      the maintainer of both projects.
      
      Both projects are MeCab wrappers, so the underlying C++ code is the
      same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
      use of the MeCab API it's easier to use.
      
      This code insures the use of a version of ipadic installed via pip,
      which should make versioning and tracking down issues easier.
      
      fugashi has wheels for Windows, OSX, and Linux, which will help with
      issues with installing old versions of mecab-python3 on Windows.
      Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
      require a C++ runtime to be installed on Windows.
      
      In adding this change I removed some code dealing with `cursor`,
      `token_start`, and `token_end` variables. These variables didn't seem to
      be used for anything, it is unclear to me why they were there.
      
      I ran the tests and they passed, though I couldn't figure out how to run
      the slow tests (`--runslow` gave an error) and didn't try testing with
      Tensorflow.
      
      * Style fix
      
      * Remove unused variable
      
      Forgot to delete this...
      
      * Adapt doc with install instructions
      
      * Fix typo
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      cf3cf304