"comfy/vscode:/vscode.git/clone" did not exist on "29ccf9f471e3b2ad4f4a08ba9f34698d357f8547"
  1. 15 Jun, 2020 1 commit
    • Anthony MOI's avatar
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220
      Anthony MOI authored
      
      [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
      
      * Use tokenizers pre-tokenized pipeline
      
      * failing pretrokenized test
      
      * Fix is_pretokenized in python
      
      * add pretokenized tests
      
      * style and quality
      
      * better tests for batched pretokenized inputs
      
      * tokenizers clean up - new padding_strategy - split the files
      
      * [HUGE] refactoring tokenizers - padding - truncation - tests
      
      * style and quality
      
      * bump up requied tokenizers version to 0.8.0-rc1
      
      * switched padding/truncation API - simpler better backward compat
      
      * updating tests for custom tokenizers
      
      * style and quality - tests on pad
      
      * fix QA pipeline
      
      * fix backward compatibility for max_length only
      
      * style and quality
      
      * Various cleans up - add verbose
      
      * fix tests
      
      * update docstrings
      
      * Fix tests
      
      * Docs reformatted
      
      * __call__ method documented
      Co-authored-by: default avatarThomas Wolf <thomwolf@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      36434220
  2. 09 Jun, 2020 1 commit
    • Patrick von Platen's avatar
      [Benchmark] add tpu and torchscipt for benchmark (#4850) · 2cfb947f
      Patrick von Platen authored
      
      
      * add tpu and torchscipt for benchmark
      
      * fix name in tests
      
      * "fix email"
      
      * make style
      
      * better log message for tpu
      
      * add more print and info for tpu
      
      * allow possibility to print tpu metrics
      
      * correct cpu usage
      
      * fix test for non-install
      
      * remove bugus file
      
      * include psutil in testing
      
      * run a couple of times before tracing in torchscript
      
      * do not allow tpu memory tracing for now
      
      * make style
      
      * add torchscript to env
      
      * better name for torch tpu
      Co-authored-by: default avatarPatrick von Platen <patrick@huggingface.co>
      2cfb947f
  3. 02 Jun, 2020 2 commits
  4. 26 May, 2020 1 commit
    • Bram Vanroy's avatar
      Make transformers-cli cross-platform (#4131) · 8cc6807e
      Bram Vanroy authored
      * make transformers-cli cross-platform
      
      Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)
      
      * make style & quality
      8cc6807e
  5. 22 May, 2020 3 commits
  6. 14 May, 2020 3 commits
    • Funtowicz Morgan's avatar
      Conversion script to export transformers models to ONNX IR. (#4253) · db0076a9
      Funtowicz Morgan authored
      * Added generic ONNX conversion script for PyTorch model.
      
      * WIP initial TF support.
      
      * TensorFlow/Keras ONNX export working.
      
      * Print framework version info
      
      * Add possibility to check the model is correctly loading on ONNX runtime.
      
      * Remove quantization option.
      
      * Specify ONNX opset version when exporting.
      
      * Formatting.
      
      * Remove unused imports.
      
      * Make functions more generally reusable from other part of the code.
      
      * isort happy.
      
      * flake happy
      
      * Export only feature-extraction for now
      
      * Correctly check inputs order / filter before export.
      
      * Removed task variable
      
      * Fix invalid args call in load_graph_from_args.
      
      * Fix invalid args call in convert.
      
      * Fix invalid args call in infer_shapes.
      
      * Raise exception and catch in caller function instead of exit.
      
      * Add 04-onnx-export.ipynb notebook
      
      * More WIP on the notebook
      
      * Remove unused imports
      
      * Simplify & remove unused constants.
      
      * Export with constant_folding in PyTorch
      
      * Let's try to put function args in the right order this time ...
      
      * Disable external_data_format temporary
      
      * ONNX notebook draft ready.
      
      * Updated notebooks charts + wording
      
      * Correct error while exporting last chart in notebook.
      
      * Adressing @LysandreJik comment.
      
      * Set ONNX opset to 11 as default value.
      
      * Set opset param mandatory
      
      * Added ONNX export unittests
      
      * Quality.
      
      * flake8 happy
      
      * Add keras2onnx dependency on extras["tf"]
      
      * Pin keras2onnx on github master to v1.6.5
      
      * Second attempt.
      
      * Third attempt.
      
      * Use the right repo URL this time ...
      
      * Do the same for onnxconverter-common
      
      * Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2
      
      * Correct commit hash.
      
      * Addressing PR review: Optimization are enabled by default.
      
      * Addressing PR review: small changes in the notebook
      
      * setup.py comment about keras2onnx versioning.
      db0076a9
    • Julien Chaumond's avatar
      Fix: unpin flake8 and fix cs errors (#4367) · 448c4672
      Julien Chaumond authored
      * Fix: unpin flake8 and fix cs errors
      
      * Ok we still need to quote those
      448c4672
    • Julien Chaumond's avatar
      [ci skip] Pin isort · 015f7812
      Julien Chaumond authored
      015f7812
  7. 13 May, 2020 1 commit
  8. 12 May, 2020 2 commits
  9. 11 May, 2020 1 commit
  10. 07 May, 2020 2 commits
  11. 05 May, 2020 1 commit
    • Lysandre Debut's avatar
      Pytorch 1.5.0 (#3973) · 79b1c696
      Lysandre Debut authored
      * Standard deviation can no longer be set to 0
      
      * Remove torch pinned version
      
      * 9th instead of 10th, silly me
      79b1c696
  12. 01 May, 2020 1 commit
  13. 27 Apr, 2020 1 commit
  14. 22 Apr, 2020 1 commit
  15. 21 Apr, 2020 1 commit
  16. 18 Apr, 2020 1 commit
    • Thomas Wolf's avatar
      Cleanup fast tokenizers integration (#3706) · 827d6d6e
      Thomas Wolf authored
      
      
      * First pass on utility classes and python tokenizers
      
      * finishing cleanup pass
      
      * style and quality
      
      * Fix tests
      
      * Updating following @mfuntowicz comment
      
      * style and quality
      
      * Fix Roberta
      
      * fix batch_size/seq_length inBatchEncoding
      
      * add alignement methods + tests
      
      * Fix OpenAI and Transfo-XL tokenizers
      
      * adding trim_offsets=True default for GPT2 et RoBERTa
      
      * style and quality
      
      * fix tests
      
      * add_prefix_space in roberta
      
      * bump up tokenizers to rc7
      
      * style
      
      * unfortunately tensorfow does like these - removing shape/seq_len for now
      
      * Update src/transformers/tokenization_utils.py
      Co-Authored-By: default avatarStefan Schweter <stefan@schweter.it>
      
      * Adding doc and docstrings
      
      * making flake8 happy
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      827d6d6e
  17. 10 Apr, 2020 1 commit
  18. 06 Apr, 2020 4 commits
    • Funtowicz Morgan's avatar
      Tokenizers v3.0.0 (#3185) · 96ab75b8
      Funtowicz Morgan authored
      
      
      * Renamed num_added_tokens to num_special_tokens_to_add
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Make fast tokenizers unittests work on Windows.
      
      * Entirely refactored unittest for tokenizers fast.
      
      * Remove ABC class for CommonFastTokenizerTest
      
      * Added embeded_special_tokens tests from allenai @dirkgr
      
      * Make embeded_special_tokens tests from allenai more generic
      
      * Uniformize vocab_size as a property for both Fast and normal tokenizers
      
      * Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)
      
      * Ensure providing None input raise the same ValueError than Python tokenizer + tests.
      
      * Fix invalid input for assert_padding when testing batch_encode_plus
      
      * Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.
      
      * Ensure tokenize() correctly forward add_special_tokens to rust.
      
      * Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
      Avoid stripping on None values.
      
      * unittests ensure tokenize() also throws a ValueError if provided None
      
      * Added add_special_tokens unittest for all supported models.
      
      * Style
      
      * Make sure TransfoXL test run only if PyTorch is provided.
      
      * Split up tokenizers tests for each model type.
      
      * Fix invalid unittest with new tokenizers API.
      
      * Filter out Roberta openai detector models from unittests.
      
      * Introduce BatchEncoding on fast tokenizers path.
      
      This new structure exposes all the mappings retrieved from Rust.
      It also keeps the current behavior with model forward.
      
      * Introduce BatchEncoding on slow tokenizers path.
      
      Backward compatibility.
      
      * Improve error message on BatchEncoding for slow path
      
      * Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.
      
      * Style and format.
      
      * Added typing on all methods for PretrainedTokenizerFast
      
      * Style and format
      
      * Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.
      
      * Style and format
      
      * encode_plus now supports pretokenized inputs.
      
      * Remove user warning about add_special_tokens when working on pretokenized inputs.
      
      * Always go through the post processor.
      
      * Added support for pretokenized input pairs on encode_plus
      
      * Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.
      
      * Added pretokenized inputs support on batch_encode_plus
      
      * Update BatchEncoding methods name to match Encoding.
      
      * Bump setup.py tokenizers dependency to 0.7.0rc1
      
      * Remove unused parameters in BertTokenizerFast
      
      * Make sure Roberta returns token_type_ids for unittests.
      
      * Added missing typings
      
      * Update add_tokens prototype to match tokenizers side and allow AddedToken
      
      * Bumping tokenizers to 0.7.0rc2
      
      * Added documentation for BatchEncoding
      
      * Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.
      
      * Added higher-level typing for tokenize / encode_plus / batch_encode_plus.
      
      * Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.
      
      * Fix text-classification pipeline using the wrong tokenizer
      
      * Make pipelines works with BatchEncoding
      
      * Turn off add_special_tokens on tokenize by default.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove add_prefix_space from tokenize call in unittest.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style and quality
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Correct message for batch_encode_plus none input exception.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Fix invalid list comprehension for offset_mapping overriding content every iteration.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * TransfoXL uses Strip normalizer.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Bump tokenizers dependency to 0.7.0rc3
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Support AddedTokens for special_tokens and use left stripping on mask for Roberta.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * SpecilaTokenMixin can use slots to faster access to underlying attributes.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Remove update_special_tokens from fast tokenizers.
      
      * Ensure TransfoXL unittests are run only when torch is available.
      
      * Style.
      Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
      
      * Style
      
      * Style 馃檹馃檹
      
      
      
      * Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.
      
      * Remove Roberta warning on __init__.
      
      * Move documentation to Google style.
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      96ab75b8
    • LysandreJik's avatar
      Re-pin isort · ea6dba27
      LysandreJik authored
      ea6dba27
    • LysandreJik's avatar
      unpin isort for pypi · 11c3257a
      LysandreJik authored
      11c3257a
    • LysandreJik's avatar
      Release: v2.8.0 · 36bffc81
      LysandreJik authored
      36bffc81
  19. 30 Mar, 2020 3 commits
  20. 26 Mar, 2020 1 commit
  21. 25 Mar, 2020 1 commit
    • Julien Chaumond's avatar
      Experiment w/ dataclasses (including Py36) (#3423) · 83272a38
      Julien Chaumond authored
      * [ci] Also run test_examples in py37
      
      (will revert at the end of the experiment)
      
      * InputExample: use immutable dataclass
      
      * [deps] Install dataclasses for Py<3.7
      
      * [skip ci] Revert "[ci] Also run test_examples in py37"
      
      This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
      83272a38
  22. 24 Mar, 2020 2 commits
  23. 23 Mar, 2020 3 commits
  24. 20 Mar, 2020 1 commit
    • Bram Vanroy's avatar
      Handle pinned version of isort · 115abd21
      Bram Vanroy authored
      The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.
      115abd21
  25. 17 Mar, 2020 1 commit