- 13 Mar, 2024 1 commit
-
-
Lysandre Debut authored
* Adds pretrained IDs directly in the tests * Fix tests * Fix tests * Review!
-
- 16 Nov, 2023 1 commit
-
-
Arthur authored
* try to stylify using ruff * might need to remove these changes? * use ruf format andruff check * use isinstance instead of type comparision * use # fmt: skip * use # fmt: skip * nits * soem styling changes * update ci job * nits isinstance * more files update * nits * more nits * small nits * check and format * revert wrong changes * actually use formatter instead of checker * nits * well docbuilder is overwriting this commit * revert notebook changes * try to nuke docbuilder * style * fix feature exrtaction test * remve `indent-width = 4` * fixup * more nits * update the ruff version that we use * style * nuke docbuilder styling * leve the print for detected changes * nits * Remove file I/O Co-authored-by:
charliermarsh <charlie.r.marsh@gmail.com> * style * nits * revert notebook changes * Add # fmt skip when possible * Add # fmt skip when possible * Fix * More ` # fmt: skip` usage * More ` # fmt: skip` usage * More ` # fmt: skip` usage * NIts * more fixes * fix tapas * Another way to skip * Recommended way * Fix two more fiels * Remove asynch Remove asynch --------- Co-authored-by:
charliermarsh <charlie.r.marsh@gmail.com>
-
- 14 Sep, 2023 1 commit
-
-
Matt authored
* First commit while I figure this out * make fixup * Remove unused method * Store prompt attrib * Fix prompt argument for tests * Make same changes in fast tokenizer * Remove global prompts from fast tokenizer too * stash commit * stash commit * Migrate PromptConfig to its True Final Location * Replace Conversation entirely with the new class * Import/dependency fixes * Import/dependency fixes * Change format for lots of default prompts * More default prompt fixups * Revert llama old methods so we can compare * Fix some default configs * Fix some default configs * Fix misspelled kwarg * Fixes for Blenderbot * make fixup * little rebase cleanup * Add basic documentation * Quick doc fix * Truncate docstring for now * Add handling for the case when messages is a single string * Quick llama merges * Update conversational pipeline and tests * Add a couple of legacy properties for backward compatibility * More legacy handling * Add docstring for build_conversation_input_ids * Restructure PromptConfig * Let's start T E M P L A T I N G * Refactor all default configs to use templates instead * Revert changes to the special token properties since we don't need them anymore * More class templates * Make the sandbox even sandier * Everything replaced with pure templating * Remove docs for PromptConfig * Add testing and optional requirement boilerplate * Fix imports and make fixup * Fix LLaMA tests and add Conversation docstring * Finally get LLaMA working with the template system * Finally get LLaMA working with the template system * make fixup * make fixup * fmt-off for the long lists of test tokens * Rename method to apply_chat_template for now * Start on documentation * Make chat_template a property that reads through to the default if it's not set * Expand docs * Expand chat templating doc some more * trim/lstrip blocks by default and update doc * Few doc tweaks * rebase cleanup * Clarify docstring * rebase cleanup * rebase cleanup * make fixup * Quick doc edit * Reformat the standard template to match ChatML * Re-add PEFT check * Update docs/source/en/chat_templating.md Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> * Add apply_chat_template to the tokenizer doc * make fixup * Add doc links * Fix chat links * Fix chat links * Explain system messages in the doc * Add chat template test * Proper save-loading for chat template attribute * Add test skips for layout models * Remove _build_conversation_input_ids, add default_chat_template to code_llama * Make sure all LLaMA models are using the latest template * Remove default_system_prompt block in code_llama because it has no default prompt * Update ConversationPipeline preprocess * Add correct #Copied from links to the default_chat_templates * Remove unneeded type checking line * Add a dummy mark_processsed method * Reorganize Conversation to have **deprecated_kwargs * Update chat_templating.md * Quick fix to LLAMA tests * Small doc tweaks * Add proper docstrings and "copied from" statements to all default chat templates * Merge use_default_system_prompt support for code_llama too * Improve clarity around self.chat_template * Docstring fix * Fix blenderbot default template * More doctest fix * Break out some tokenizer kwargs * Update doc to explain default templates * Quick tweaks to tokenizer args * Cleanups for tokenizer args * Add note about cacheing * Quick tweak to the chat-templating doc * Update the LLaMA template with error checking and correct system message embedding * make fixup * make fixup * add requires_jinja * Cleanup to expected output formatting * Add cacheing * Fix typo in llama default template * Update LLaMA tests * Update documentation * Improved legacy handling in the Conversation class * Update Jinja template with proper error handling * Quick bugfix * Proper exception raising * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env * make fixup * rebase cleanup --------- Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com>
-
- 18 Aug, 2023 1 commit
-
-
Arthur authored
* draft changes * update and add tests * styling for no * move test * path to usable model * update test * small update * update bertbased tokenizers * don'tuse kwargs for _tokenize * don'tuse kwargs for _tokenize * fix copies * update * update test for special tokenizers * fixup * skip two tests * remove pdb breakpiont() * wowo * rewrite custom tests * nits * revert chang in target keys * fix markup lm * update documentation of the argument
-
- 06 Feb, 2023 1 commit
-
-
Sylvain Gugger authored
* Result of black 23.1 * Update target to Python 3.7 * Switch flake8 to ruff * Configure isort * Configure isort * Apply isort with line limit * Put the right black version * adapt black in check copies * Fix copies
-
- 12 Jan, 2023 1 commit
-
-
Yih-Dar authored
* Fix for Past CI * make style * clean up * unindent 2 blocks Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 24 Aug, 2022 1 commit
-
-
SaulLu authored
add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer (#18693) * add warning to let the user know that the method is slower that for a fast tokenizer * user warnings * fix layoutlmv2 * fix layout* * change warnings into logger.warning
-
- 11 Jul, 2022 1 commit
-
-
Yulv-git authored
* Fix some typos. Signed-off-by:
Yulv-git <yulvchi@qq.com> * Fix typo. Signed-off-by:
Yulv-git <yulvchi@qq.com> * make fixup.
-
- 12 May, 2022 1 commit
-
-
Sylvain Gugger authored
* Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black
-
- 03 May, 2022 1 commit
-
-
Yih-Dar authored
* move test model folders (TODO: fix imports and others) * fix (potentially partially) imports (in model test modules) * fix (potentially partially) imports (in tokenization test modules) * fix (potentially partially) imports (in feature extraction test modules) * fix import utils.test_modeling_tf_core * fix path ../fixtures/ * fix imports about generation.test_generation_flax_utils * fix more imports * fix fixture path * fix get_test_dir * update module_to_test_file * fix get_tests_dir from wrong transformers.utils * update config.yml (CircleCI) * fix style * remove missing imports * update new model script * update check_repo * update SPECIAL_MODULE_TO_TEST_MAP * fix style * add __init__ * update self-scheduled * fix add_new_model scripts * check one way to get location back * python setup.py build install * fix import in test auto * update self-scheduled.yml * update slack notification script * Add comments about artifact names * fix for yolos Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 08 Mar, 2022 1 commit
-
-
NielsRogge authored
* Fix LayoutLMv2 test * Update black
-
- 23 Feb, 2022 1 commit
-
-
Lysandre Debut authored
* Per-folder tests reorganization Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
- 03 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Enabling `truncation_side` for Slow and Fast tokenizer. Co-Authored-by:
Niels Rogge <48327001+NielsRogge@users.noreply.github.com> * Disable failing tests. * Layout xlm. * assert -> assertEqual. Co-authored-by:
Niels Rogge <48327001+NielsRogge@users.noreply.github.com>
-
- 09 Nov, 2021 1 commit
-
-
Apoorv Garg authored
* correct order of overflowing tokens for LayoutLmV2 tokenizer * test to check order of overflowing_tokens for a seq of input_ids * fix up quality * added suggested changes * check that tests the bbox sequence * pair_input test added * pass quality test * check bbox sequence added * unittest method * comments added * add overflowing bbox test * improved "seq_1" Co-authored-by:
SaulLu <55560583+SaulLu@users.noreply.github.com> * improve code quality Co-authored-by:
SaulLu <lucilesaul.com@gmail.com> Co-authored-by:
SaulLu <55560583+SaulLu@users.noreply.github.com>
-
- 30 Aug, 2021 1 commit
-
-
NielsRogge authored
* First commit * Make style * Fix dummy objects * Add Detectron2 config * Add LayoutLMv2 pooler * More improvements, add documentation * More improvements * Add model tests * Add clarification regarding image input * Improve integration test * Fix bug * Fix another bug * Fix another bug * Fix another bug * More improvements * Make more tests pass * Make more tests pass * Improve integration test * Remove gradient checkpointing and add head masking * Add integration test * Add LayoutLMv2ForSequenceClassification to the tests * Add LayoutLMv2ForQuestionAnswering * More improvements * More improvements * Small improvements * Fix _LazyModule * Fix fast tokenizer * Move sync_batch_norm to a separate method * Replace dummies by requires_backends * Move calculation of visual bounding boxes to separate method + update README * Add models to main init * First draft * More improvements * More improvements * More improvements * More improvements * More improvements * Remove is_split_into_words * More improvements * Simply tesseract - no use of pandas anymore * Add LayoutLMv2Processor * Update is_pytesseract_available * Fix bugs * Improve feature extractor * Fix bug * Add print statement * Add truncation of bounding boxes * Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer * Improve tokenizer tests * Make more tokenizer tests pass * Make more tests pass, add integration tests * Finish integration tests * More improvements * More improvements - update API of the tokenizer * More improvements * Remove support for VQA training * Remove some files * Improve feature extractor * Improve documentation and one more tokenizer test * Make quality and small docs improvements * Add batched tests for LayoutLMv2Processor, remove fast tokenizer * Add truncation of labels * Apply suggestions from code review * Improve processor tests * Fix failing tests and add suggestion from code review * Fix tokenizer test * Add detectron2 CI job * Simplify CI job * Comment out non-detectron2 jobs and specify number of processes * Add pip install torchvision * Add durations to see which tests are slow * Fix tokenizer test and make model tests smaller * Frist draft * Use setattr * Possible fix * Proposal with configuration * First draft of fast tokenizer * More improvements * Enable fast tokenizer tests * Make more tests pass * Make more tests pass * More improvements * Addd padding to fast tokenizer * Mkae more tests pass * Make more tests pass * Make all tests pass for fast tokenizer * Make fast tokenizer support overflowing boxes and labels * Add support for overflowing_labels to slow tokenizer * Add support for fast tokenizer to the processor * Update processor tests for both slow and fast tokenizers * Add head models to model mappings * Make style & quality * Remove Detectron2 config file * Add configurable option to label all subwords * Fix test * Skip visual segment embeddings in test * Use ResNet-18 backbone in tests instead of ResNet-101 * Proposal * Re-enable all jobs on CI * Fix installation of tesseract * Fix failing test * Fix index table * Add LayoutXLM doc page, first draft of code examples * Improve documentation a lot * Update expected boxes for Tesseract 4.0.0 beta * Use offsets to create labels instead of checking if they start with ## * Update expected boxes for Tesseract 4.1.1 * Fix conflict * Make variable names cleaner, add docstring, add link to notebooks * Revert "Fix conflict" This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5. * Revert to make integration test pass * Apply suggestions from @LysandreJik's review * Address @patrickvonplaten's comments * Remove fixtures DocVQA in favor of dataset on the hub Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr>
-