1. 10 Nov, 2020 1 commit
  2. 29 Oct, 2020 1 commit
  3. 26 Oct, 2020 1 commit
    • Sylvain Gugger's avatar
      Doc styling (#8067) · 08f534d2
      Sylvain Gugger authored
      * Important files
      
      * Styling them all
      
      * Revert "Styling them all"
      
      This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
      
      * Syling them for realsies
      
      * Fix syntax error
      
      * Fix benchmark_utils
      
      * More fixes
      
      * Fix modeling auto and script
      
      * Remove new line
      
      * Fixes
      
      * More fixes
      
      * Fix more files
      
      * Style
      
      * Add FSMT
      
      * More fixes
      
      * More fixes
      
      * More fixes
      
      * More fixes
      
      * Fixes
      
      * More fixes
      
      * More fixes
      
      * Last fixes
      
      * Make sphinx happy
      08f534d2
  4. 18 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a
      Thomas Wolf authored
      * splitting fast and slow tokenizers [WIP]
      
      * [WIP] splitting sentencepiece and tokenizers dependencies
      
      * update dummy objects
      
      * add name_or_path to models and tokenizers
      
      * prefix added to file names
      
      * prefix
      
      * styling + quality
      
      * spliting all the tokenizer files - sorting sentencepiece based ones
      
      * update tokenizer version up to 0.9.0
      
      * remove hard dependency on sentencepiece 馃帀
      
      * and removed hard dependency on tokenizers 馃帀
      
      
      
      * update conversion script
      
      * update missing models
      
      * fixing tests
      
      * move test_tokenization_fast to main tokenization tests - fix bugs
      
      * bump up tokenizers
      
      * fix bert_generation
      
      * update ad fix several tokenizers
      
      * keep sentencepiece in deps for now
      
      * fix funnel and deberta tests
      
      * fix fsmt
      
      * fix marian tests
      
      * fix layoutlm
      
      * fix squeezebert and gpt2
      
      * fix T5 tokenization
      
      * fix xlnet tests
      
      * style
      
      * fix mbart
      
      * bump up tokenizers to 0.9.2
      
      * fix model tests
      
      * fix tf models
      
      * fix seq2seq examples
      
      * fix tests without sentencepiece
      
      * fix slow => fast  conversion without sentencepiece
      
      * update auto and bert generation tests
      
      * fix mbart tests
      
      * fix auto and common test without tokenizers
      
      * fix tests without tokenizers
      
      * clean up tests lighten up when tokenizers + sentencepiece are both off
      
      * style quality and tests fixing
      
      * add sentencepiece to doc/examples reqs
      
      * leave sentencepiece on for now
      
      * style quality split hebert and fix pegasus
      
      * WIP Herbert fast
      
      * add sample_text_no_unicode and fix hebert tokenization
      
      * skip FSMT example test for now
      
      * fix style
      
      * fix fsmt in example tests
      
      * update following Lysandre and Sylvain's comments
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/testing_utils.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ba8c4d0a
  5. 08 Oct, 2020 1 commit
    • Thomas Wolf's avatar
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58
      Thomas Wolf authored
      
      Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
      
      * [WIP] SP tokenizers
      
      * fixing tests for T5
      
      * WIP tokenizers
      
      * serialization
      
      * update T5
      
      * WIP T5 tokenization
      
      * slow to fast conversion script
      
      * Refactoring to move tokenzier implementations inside transformers
      
      * Adding gpt - refactoring - quality
      
      * WIP adding several tokenizers to the fast world
      
      * WIP Roberta - moving implementations
      
      * update to dev4 switch file loading to in-memory loading
      
      * Updating and fixing
      
      * advancing on the tokenizers - updating do_lower_case
      
      * style and quality
      
      * moving forward with tokenizers conversion and tests
      
      * MBart, T5
      
      * dumping the fast version of transformer XL
      
      * Adding to autotokenizers + style/quality
      
      * update init and space_between_special_tokens
      
      * style and quality
      
      * bump up tokenizers version
      
      * add protobuf
      
      * fix pickle Bert JP with Mecab
      
      * fix newly added tokenizers
      
      * style and quality
      
      * fix bert japanese
      
      * fix funnel
      
      * limite tokenizer warning to one occurence
      
      * clean up file
      
      * fix new tokenizers
      
      * fast tokenizers deep tests
      
      * WIP adding all the special fast tests on the new fast tokenizers
      
      * quick fix
      
      * adding more fast tokenizers in the fast tests
      
      * all tokenizers in fast version tested
      
      * Adding BertGenerationFast
      
      * bump up setup.py for CI
      
      * remove BertGenerationFast (too early)
      
      * bump up tokenizers version
      
      * Clean old docstrings
      
      * Typo
      
      * Update following Lysandre comments
      Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
      9aeacb58
  6. 05 Oct, 2020 1 commit
  7. 23 Sep, 2020 1 commit
  8. 22 Sep, 2020 1 commit
  9. 16 Sep, 2020 1 commit
  10. 15 Sep, 2020 1 commit
  11. 04 Sep, 2020 1 commit
  12. 28 Aug, 2020 1 commit
    • Sam Shleifer's avatar
      prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a
      Sam Shleifer authored
      * broken test
      
      * batch parity
      
      * tests pass
      
      * boom boom
      
      * boom boom
      
      * split out bart tokenizer tests
      
      * fix tests
      
      * boom boom
      
      * Fixed dataset bug
      
      * Fix marian
      
      * Undo extra
      
      * Get marian working
      
      * Fix t5 tok tests
      
      * Test passing
      
      * Cleanup
      
      * better assert msg
      
      * require torch
      
      * Fix mbart tests
      
      * undo extra decoder_attn_mask change
      
      * Fix import
      
      * pegasus tokenizer can ignore src_lang kwargs
      
      * unused kwarg test cov
      
      * boom boom
      
      * add todo for pegasus issue
      
      * cover one word translation edge case
      
      * Cleanup
      
      * doc
      9336086a
  13. 26 Aug, 2020 1 commit
  14. 14 Aug, 2020 1 commit
    • Suraj Patil's avatar
      MBartForConditionalGeneration (#6441) · 680f1337
      Suraj Patil authored
      * add MBartForConditionalGeneration
      
      * style
      
      * rebase and fixes
      
      * add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS
      
      * fix docs
      
      * don't ignore mbart
      
      * doc
      
      * fix mbart fairseq link
      
      * put mbart before bart
      
      * apply doc suggestions
      680f1337
  15. 11 Aug, 2020 1 commit
  16. 28 Jul, 2020 2 commits
  17. 23 Jul, 2020 1 commit
  18. 18 Jul, 2020 1 commit
  19. 07 Jul, 2020 2 commits
  20. 28 Jun, 2020 1 commit
  21. 26 Jun, 2020 1 commit
  22. 25 Jun, 2020 2 commits
  23. 16 Jun, 2020 1 commit
    • Yacine Jernite's avatar
      Eli5 examples (#4968) · 49c52025
      Yacine Jernite authored
      
      
      * add eli5 examples
      
      * add dense query script
      
      * query_di
      
      * merging
      
      * merging
      
      * add_utils
      
      * adds nearest neighbor wikipedia
      
      * batch queries
      
      * training_retriever
      
      * new notebooks
      
      * moved retriever traiing script
      
      * finished wiki40b
      
      * max_len_fix
      
      * train_s2s
      
      * retriever_batch_checkpointing
      
      * cleanup
      
      * merge
      
      * dim_fix
      
      * fix_indexer
      
      * fix_wiki40b_snippets
      
      * fix_embed_for_r
      
      * fp32 index
      
      * fix_sparse_q
      
      * joint_training
      
      * remove obsolete datasets
      
      * add_passage_nn_results
      
      * add_passage_nn_results
      
      * add_batch_nn
      
      * add_batch_nn
      
      * add_data_scripts
      
      * notebook
      
      * notebook
      
      * notebook
      
      * fix_multi_gpu
      
      * add_app
      
      * full_caching
      
      * full_caching
      
      * notebook
      
      * sparse_done
      
      * images
      
      * notebook
      
      * add_image_gif
      
      * with_Gif
      
      * add_contr_image
      
      * notebook
      
      * notebook
      
      * notebook
      
      * train_functions
      
      * notebook
      
      * min_retrieval_length
      
      * pandas_option
      
      * notebook
      
      * min_retrieval_length
      
      * notebook
      
      * notebook
      
      * eval_Retriever
      
      * notebook
      
      * images
      
      * notebook
      
      * add_example
      
      * add_example
      
      * notebook
      
      * fireworks
      
      * notebook
      
      * notebook
      
      * joe's notebook comments
      
      * app_update
      
      * notebook
      
      * notebook_link
      
      * captions
      
      * notebook
      
      * assing RetriBert model
      
      * add RetriBert to Auto
      
      * change AutoLMHead to AutoSeq2Seq
      
      * notebook downloads from hf models
      
      * style_black
      
      * style_black
      
      * app_update
      
      * app_update
      
      * fix_app_update
      
      * style
      
      * style
      
      * isort
      
      * Delete WikiELI5training.ipynb
      
      * Delete evaluate_eli5.py
      
      * Delete WikiELI5explore.ipynb
      
      * Delete ExploreWikiELI5Support.html
      
      * Delete explainlikeimfive.py
      
      * Delete wiki_snippets.py
      
      * children before parent
      
      * children before parent
      
      * style_black
      
      * style_black_only
      
      * isort
      
      * isort_new
      
      * Update src/transformers/modeling_retribert.py
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      
      * typo fixes
      
      * app_without_asset
      
      * cleanup
      
      * Delete ELI5animation.gif
      
      * Delete ELI5contrastive.svg
      
      * Delete ELI5wiki_index.svg
      
      * Delete choco_bis.svg
      
      * Delete fireworks.gif
      
      * Delete huggingface_logo.jpg
      
      * Delete huggingface_logo.svg
      
      * Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb
      
      * Delete eli5_app.py
      
      * Delete eli5_utils.py
      
      * readme
      
      * Update README.md
      
      * unused imports
      
      * moved_info
      
      * default_beam
      
      * ftuned model
      
      * disclaimer
      
      * Update src/transformers/modeling_retribert.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * black
      
      * add_doc
      
      * names
      
      * isort_Examples
      
      * isort_Examples
      
      * Add doc to index
      Co-authored-by: default avatarJulien Chaumond <chaumond@gmail.com>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      49c52025
  24. 15 Jun, 2020 1 commit
  25. 14 Jun, 2020 1 commit
  26. 11 Jun, 2020 1 commit
  27. 02 Jun, 2020 1 commit
  28. 18 May, 2020 1 commit
  29. 10 Apr, 2020 1 commit
  30. 29 Mar, 2020 1 commit
  31. 02 Mar, 2020 1 commit
    • Sam Shleifer's avatar
      Bart-CNN (#3059) · b54ef78d
      Sam Shleifer authored
      `generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
      b54ef78d
  32. 20 Feb, 2020 1 commit
    • Sam Shleifer's avatar
      New BartModel (#2745) · 53ce3854
      Sam Shleifer authored
      * Results same as fairseq
      * Wrote a ton of tests
      * Struggled with api signatures
      * added some docs
      
      53ce3854