- 29 Jun, 2023 1 commit
-
-
amyeroberts authored
* Update old existing feature extractor references * Typo * Apply suggestions from code review * Apply suggestions from code review * Apply suggestions from code review * Address comments from review - update 'feature extractor' Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
-
- 22 Feb, 2023 1 commit
-
-
Aaron Gokaslan authored
-
- 06 Feb, 2023 1 commit
-
-
Sylvain Gugger authored
* Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies
-
- 14 Nov, 2022 1 commit
-
-
Bartosz Szmelczynski authored
* First draft * Remove scatter dependency * Add require_torch * update vectorized sum test, add clone call * remove artifacts * fix style * fix style v2 * remove "scatter" mentions from the code base * fix isort error Co-authored-by:
Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
- 10 Nov, 2022 1 commit
-
-
Sanchit Gandhi authored
* [processor] Add 'model input names' property * add test * no f string * add generic property method to mixin * copy to multimodal * copy to vision * tests for all audio * remove ad-hoc tests * style * fix flava test * fix test * fix processor code
-
- 02 Nov, 2022 1 commit
-
-
Ben Eyal authored
馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775) * Add test for SentencePiece not adding special tokens to strings * Add SentencePieceStringConversionMixin to fix issue 15003 * Fix conversion from tokens to string for most SentencePiece tokenizers Tokenizers fixed: - AlbertTokenizer - BarthezTokenizer - CamembertTokenizer - FNetTokenizer - M2M100Tokenizer - MBart50Tokenizer - PegasusTokenizer - Speech2TextTokenizer * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab * Fix DebertaV2Tokenizer * Ignore LayoutXLMTokenizer in SentencePiece string conversion test * Run 'make style' and 'make quality' * Clean convert_tokens_to_string test Instead of explicitly ignoring LayoutXLMTokenizer in the test, override the test in LayoutLMTokenizationTest and do nothing in it. * Remove commented out code * Improve robustness of convert_tokens_to_string test Instead of comparing lengths of re-tokenized text and input_ids, check that converting all special tokens to string yields a string with all special tokens. * Inline and remove SentencePieceStringConversionMixin The convert_tokens_to_string method is now implemented in each relevant SentencePiece tokenizer. * Run 'make style' and 'make quality' * Revert removal of space in convert_tokens_to_string * Remove redundant import * Revert test text to original * Uncomment the lowercasing of the reverse_text variable * Mimic Rust tokenizer behavior for tokenizers - Albert - Barthez - Camembert - MBart50 - T5 * Fix accidentally skipping test in wrong tokenizer * Add test for equivalent Rust and slow tokenizer behavior * Override _decode in BigBirdTokenizer to mimic Rust behavior * Override _decode in FNetTokenizer to mimic Rust behavior * Override _decode in XLNetTokenizer to mimic Rust behavior * Remove unused 're' import * Update DebertaV2Tokenizer to mimic Rust tokenizer * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested. * Ignore problematic tests in Deberta V2 * Add comment on why the Deberta V2 tests are skipped
-
- 30 Aug, 2022 1 commit
-
-
anthony2261 authored
-
- 24 Aug, 2022 1 commit
-
-
SaulLu authored
add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer (#18693) * add warning to let the user know that the method is slower that for a fast tokenizer * user warnings * fix layoutlmv2 * fix layout* * change warnings into logger.warning
-
- 01 Jun, 2022 1 commit
-
-
Yih-Dar authored
* fix Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 24 May, 2022 1 commit
-
-
NielsRogge authored
* Make forward pass work * More improvements * Remove unused imports * Remove timm dependency * Improve loss calculation of token classifier * Fix most tests * Add docs * Add model integration test * Make all tests pass * Add LayoutLMv3FeatureExtractor * Improve integration test + make fixup * Add example script * Fix style * Add LayoutLMv3Processor * Fix style * Add option to add visual labels * Make more tokenizer tests pass * Fix more tests * Make more tests pass * Fix bug and improve docs * Fix import of processors * Improve docstrings * Fix toctree and improve docs * Fix auto tokenizer * Move tests to model folder * Move tests to model folder * change default behavior add_prefix_space * add prefix space for fast * add_prefix_spcae set to True for Fast * no space before `unique_no_split` token * add test to hightligh special treatment of added tokens * fix `test_batch_encode_dynamic_overflowing` by building a long enough example * fix `test_full_tokenizer` with add_prefix_token * Fix tokenizer integration test * Make the code more readable * Add tests for LayoutLMv3Processor * Fix style * Add model to README and update init * Apply suggestions from code review * Replace asserts by value errors * Add suggestion by @ducviet00 * Add model to doc tests * Simplify script * Improve README * a step ahead to fix * Update pair_input_test * Make all tokenizer tests pass - phew * Make style * Add LayoutLMv3 to CI job * Fix auto mapping * Fix CI job name * Make all processor tests pass * Make tests of LayoutLMv2 and LayoutXLM consistent * Add copied from statements to fast tokenizer * Add copied from statements to slow tokenizer * Remove add_visual_labels attribute * Fix tests * Add link to notebooks * Improve docs of LayoutLMv3Processor * Fix reference to section Co-authored-by:
SaulLu <lucilesaul.com@gmail.com> Co-authored-by:
Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
-
- 12 May, 2022 1 commit
-
-
Sylvain Gugger authored
* Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black
-
- 03 May, 2022 1 commit
-
-
Yih-Dar authored
* move test model folders (TODO: fix imports and others) * fix (potentially partially) imports (in model test modules) * fix (potentially partially) imports (in tokenization test modules) * fix (potentially partially) imports (in feature extraction test modules) * fix import utils.test_modeling_tf_core * fix path ../fixtures/ * fix imports about generation.test_generation_flax_utils * fix more imports * fix fixture path * fix get_test_dir * update module_to_test_file * fix get_tests_dir from wrong transformers.utils * update config.yml (CircleCI) * fix style * remove missing imports * update new model script * update check_repo * update SPECIAL_MODULE_TO_TEST_MAP * fix style * add __init__ * update self-scheduled * fix add_new_model scripts * check one way to get location back * python setup.py build install * fix import in test auto * update self-scheduled.yml * update slack notification script * Add comments about artifact names * fix for yolos Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-