"sgl-router/src/routers/vscode:/vscode.git/clone" did not exist on "7a06ef984d262cd9bd38d4ef83382ab5c6e73aa8"
  1. 17 Apr, 2020 2 commits
  2. 16 Apr, 2020 2 commits
  3. 14 Apr, 2020 1 commit
  4. 13 Apr, 2020 1 commit
  5. 10 Apr, 2020 2 commits
  6. 09 Apr, 2020 2 commits
  7. 08 Apr, 2020 1 commit
  8. 07 Apr, 2020 2 commits
  9. 06 Apr, 2020 2 commits
    • Funtowicz Morgan's avatar
      Tokenizers v3.0.0 (#3185) · 96ab75b8
      Funtowicz Morgan authored
      
      
      * Renamed num_added_tokens to num_special_tokens_to_add
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Make fast tokenizers unittests work on Windows.
      
      * Entirely refactored unittest for tokenizers fast.
      
      * Remove ABC class for CommonFastTokenizerTest
      
      * Added embeded_special_tokens tests from allenai @dirkgr
      
      * Make embeded_special_tokens tests from allenai more generic
      
      * Uniformize vocab_size as a property for both Fast and normal tokenizers
      
      * Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)
      
      * Ensure providing None input raise the same ValueError than Python tokenizer + tests.
      
      * Fix invalid input for assert_padding when testing batch_encode_plus
      
      * Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.
      
      * Ensure tokenize() correctly forward add_special_tokens to rust.
      
      * Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
      Avoid stripping on None values.
      
      * unittests ensure tokenize() also throws a ValueError if provided None
      
      * Added add_special_tokens unittest for all supported models.
      
      * Style
      
      * Make sure TransfoXL test run only if PyTorch is provided.
      
      * Split up tokenizers tests for each model type.
      
      * Fix invalid unittest with new tokenizers API.
      
      * Filter out Roberta openai detector models from unittests.
      
      * Introduce BatchEncoding on fast tokenizers path.
      
      This new structure exposes all the mappings retrieved from Rust.
      It also keeps the current behavior with model forward.
      
      * Introduce BatchEncoding on slow tokenizers path.
      
      Backward compatibility.
      
      * Improve error message on BatchEncoding for slow path
      
      * Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.
      
      * Style and format.
      
      * Added typing on all methods for PretrainedTokenizerFast
      
      * Style and format
      
      * Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.
      
      * Style and format
      
      * encode_plus now supports pretokenized inputs.
      
      * Remove user warning about add_special_tokens when working on pretokenized inputs.
      
      * Always go through the post processor.
      
      * Added support for pretokenized input pairs on encode_plus
      
      * Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.
      
      * Added pretokenized inputs support on batch_encode_plus
      
      * Update BatchEncoding methods name to match Encoding.
      
      * Bump setup.py tokenizers dependency to 0.7.0rc1
      
      * Remove unused parameters in BertTokenizerFast
      
      * Make sure Roberta returns token_type_ids for unittests.
      
      * Added missing typings
      
      * Update add_tokens prototype to match tokenizers side and allow AddedToken
      
      * Bumping tokenizers to 0.7.0rc2
      
      * Added documentation for BatchEncoding
      
      * Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.
      
      * Added higher-level typing for tokenize / encode_plus / batch_encode_plus.
      
      * Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.
      
      * Fix text-classification pipeline using the wrong tokenizer
      
      * Make pipelines works with BatchEncoding
      
      * Turn off add_special_tokens on tokenize by default.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove add_prefix_space from tokenize call in unittest.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style and quality
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Correct message for batch_encode_plus none input exception.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Fix invalid list comprehension for offset_mapping overriding content every iteration.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * TransfoXL uses Strip normalizer.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Bump tokenizers dependency to 0.7.0rc3
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Support AddedTokens for special_tokens and use left stripping on mask for Roberta.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * SpecilaTokenMixin can use slots to faster access to underlying attributes.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove update_special_tokens from fast tokenizers.
      
      * Ensure TransfoXL unittests are run only when torch is available.
      
      * Style.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style
      
      * Style 🙏🙏
      
      
      
      * Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.
      
      * Remove Roberta warning on __init__.
      
      * Move documentation to Google style.
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      96ab75b8
    • Patrick von Platen's avatar
      [Generate, Test] Split generate test function into beam search, no beam search (#3601) · 2ee41056
      Patrick von Platen authored
      * split beam search and no beam search test
      
      * fix test
      
      * clean generate tests
      2ee41056
  10. 03 Apr, 2020 2 commits
    • Lysandre Debut's avatar
      ELECTRA (#3257) · d5d7d886
      Lysandre Debut authored
      * Electra wip
      
      * helpers
      
      * Electra wip
      
      * Electra v1
      
      * ELECTRA may be saved/loaded
      
      * Generator & Discriminator
      
      * Embedding size instead of halving the hidden size
      
      * ELECTRA Tokenizer
      
      * Revert BERT helpers
      
      * ELECTRA Conversion script
      
      * Archive maps
      
      * PyTorch tests
      
      * Start fixing tests
      
      * Tests pass
      
      * Same configuration for both models
      
      * Compatible with base + large
      
      * Simplification + weight tying
      
      * Archives
      
      * Auto + Renaming to standard names
      
      * ELECTRA is uncased
      
      * Tests
      
      * Slight API changes
      
      * Update tests
      
      * wip
      
      * ElectraForTokenClassification
      
      * temp
      
      * Simpler arch + tests
      
      Removed ElectraForPreTraining which will be in a script
      
      * Conversion script
      
      * Auto model
      
      * Update links to S3
      
      * Split ElectraForPreTraining and ElectraForTokenClassification
      
      * Actually test PreTraining model
      
      * Remove num_labels from configuration
      
      * wip
      
      * wip
      
      * From discriminator and generator to electra
      
      * Slight API changes
      
      * Better naming
      
      * TensorFlow ELECTRA tests
      
      * Accurate conversion script
      
      * Added to conversion script
      
      * Fast ELECTRA tokenizer
      
      * Style
      
      * Add ELECTRA to README
      
      * Modeling Pytorch Doc + Real style
      
      * TF Docs
      
      * Docs
      
      * Correct links
      
      * Correct model intialized
      
      * random fixes
      
      * style
      
      * Addressing Patrick's and Sam's comments
      
      * Correct links in docs
      d5d7d886
    • Yohei Tamura's avatar
      BertJapaneseTokenizer accept options for mecab (#3566) · 8594dd80
      Yohei Tamura authored
      * BertJapaneseTokenizer accept options for mecab
      
      * black
      
      * fix mecab_option to Option[str]
      8594dd80
  11. 01 Apr, 2020 2 commits
  12. 31 Mar, 2020 1 commit
  13. 30 Mar, 2020 2 commits
  14. 29 Mar, 2020 1 commit
  15. 27 Mar, 2020 1 commit
  16. 26 Mar, 2020 4 commits
    • Sam Shleifer's avatar
      [Bart/Memory] don't create lm_head (#3323) · 39371ee4
      Sam Shleifer authored
      * delete lm_head, skips weight tying
      * Fixed s3
      39371ee4
    • sakares saengkaew's avatar
      Add missing token classification for XLM (#3277) · 1a6c546c
      sakares saengkaew authored
      
      
      * Add the missing token classification for XLM
      
      * fix styling
      
      * Add XLMForTokenClassification to AutoModelForTokenClassification class
      
      * Fix docstring typo for non-existing class
      
      * Add the missing token classification for XLM
      
      * fix styling
      
      * fix styling
      
      * Add XLMForTokenClassification to AutoModelForTokenClassification class
      
      * Fix docstring typo for non-existing class
      
      * Add missing description for AlbertForTokenClassification
      
      * fix styling
      
      * Add missing docstring for AlBert
      
      * Slow tests should be slow
      Co-authored-by: default avatarSakares Saengkaew <s.sakares@gmail.com>
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      1a6c546c
    • Patrick von Platen's avatar
      Adds translation pipeline (#3419) · 022e8fab
      Patrick von Platen authored
      * fix merge conflicts
      
      * add t5 summarization example
      
      * change parameters for t5 summarization
      
      * make style
      
      * add first code snippet for translation
      
      * only add prefixes
      
      * add prefix patterns
      
      * make style
      
      * renaming
      
      * fix conflicts
      
      * remove unused patterns
      
      * solve conflicts
      
      * fix merge conflicts
      
      * remove translation example
      
      * remove summarization example
      
      * make sure tensors are in numpy for float comparsion
      
      * re-add t5 config
      
      * fix t5 import config typo
      
      * make style
      
      * remove unused numpy statements
      
      * update doctstring
      
      * import translation pipeline
      022e8fab
    • Patrick von Platen's avatar
      Add t5 to pipeline(task='summarization') (#3413) · 9c683ef0
      Patrick von Platen authored
      * solve conflicts
      
      * move warnings below
      
      * incorporate changes
      
      * add pad_to_max_length to pipelines
      
      * add bug fix for T5 beam search
      
      * add prefix patterns
      
      * make style
      
      * fix conflicts
      
      * adapt pipelines for task specific parameters
      
      * improve docstring
      
      * remove unused patterns
      9c683ef0
  17. 24 Mar, 2020 1 commit
  18. 20 Mar, 2020 1 commit
  19. 19 Mar, 2020 2 commits
    • Patrick von Platen's avatar
      Support T5 Generation (#3228) · bbf26c4e
      Patrick von Platen authored
      
      
      * fix conflicts
      
      * update bart max length test
      
      * correct spelling mistakes
      
      * implemented model specific encode function
      
      * fix merge conflicts
      
      * better naming
      
      * save intermediate state -> need to rethink strucuture a bit
      
      * leave tf problem as it is for now
      
      * current version
      
      * add layers.pop
      
      * remove ipdb
      
      * make style
      
      * clean return cut decoding
      
      * remove ipdbs
      
      * Fix restoring layers in the decoders that doesnt exists.
      
      * push good intermediate solution for now
      
      * fix conflicts
      
      * always good to refuse to merge conflicts when rebasing
      
      * fix small bug
      
      * improve function calls
      
      * remove unused file
      
      * add correct scope behavior for t5_generate
      Co-authored-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>
      bbf26c4e
    • Sam Shleifer's avatar
  20. 18 Mar, 2020 2 commits
  21. 17 Mar, 2020 3 commits
  22. 16 Mar, 2020 1 commit
  23. 13 Mar, 2020 1 commit
  24. 12 Mar, 2020 1 commit