1. 22 Jul, 2024 1 commit
  2. 26 Jun, 2024 1 commit
  3. 22 May, 2024 1 commit
  4. 20 Mar, 2024 1 commit
  5. 13 Mar, 2024 1 commit
  6. 16 Nov, 2023 1 commit
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
  7. 14 Sep, 2023 1 commit
    • Matt's avatar
      Overhaul Conversation class and prompt templating (#25323) · 866df66f
      Matt authored
      
      
      * First commit while I figure this out
      
      * make fixup
      
      * Remove unused method
      
      * Store prompt attrib
      
      * Fix prompt argument for tests
      
      * Make same changes in fast tokenizer
      
      * Remove global prompts from fast tokenizer too
      
      * stash commit
      
      * stash commit
      
      * Migrate PromptConfig to its True Final Location
      
      * Replace Conversation entirely with the new class
      
      * Import/dependency fixes
      
      * Import/dependency fixes
      
      * Change format for lots of default prompts
      
      * More default prompt fixups
      
      * Revert llama old methods so we can compare
      
      * Fix some default configs
      
      * Fix some default configs
      
      * Fix misspelled kwarg
      
      * Fixes for Blenderbot
      
      * make fixup
      
      * little rebase cleanup
      
      * Add basic documentation
      
      * Quick doc fix
      
      * Truncate docstring for now
      
      * Add handling for the case when messages is a single string
      
      * Quick llama merges
      
      * Update conversational pipeline and tests
      
      * Add a couple of legacy properties for backward compatibility
      
      * More legacy handling
      
      * Add docstring for build_conversation_input_ids
      
      * Restructure PromptConfig
      
      * Let's start T E M P L A T I N G
      
      * Refactor all default configs to use templates instead
      
      * Revert changes to the special token properties since we don't need them anymore
      
      * More class templates
      
      * Make the sandbox even sandier
      
      * Everything replaced with pure templating
      
      * Remove docs for PromptConfig
      
      * Add testing and optional requirement boilerplate
      
      * Fix imports and make fixup
      
      * Fix LLaMA tests and add Conversation docstring
      
      * Finally get LLaMA working with the template system
      
      * Finally get LLaMA working with the template system
      
      * make fixup
      
      * make fixup
      
      * fmt-off for the long lists of test tokens
      
      * Rename method to apply_chat_template for now
      
      * Start on documentation
      
      * Make chat_template a property that reads through to the default if it's not set
      
      * Expand docs
      
      * Expand chat templating doc some more
      
      * trim/lstrip blocks by default and update doc
      
      * Few doc tweaks
      
      * rebase cleanup
      
      * Clarify docstring
      
      * rebase cleanup
      
      * rebase cleanup
      
      * make fixup
      
      * Quick doc edit
      
      * Reformat the standard template to match ChatML
      
      * Re-add PEFT check
      
      * Update docs/source/en/chat_templating.md
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * Add apply_chat_template to the tokenizer doc
      
      * make fixup
      
      * Add doc links
      
      * Fix chat links
      
      * Fix chat links
      
      * Explain system messages in the doc
      
      * Add chat template test
      
      * Proper save-loading for chat template attribute
      
      * Add test skips for layout models
      
      * Remove _build_conversation_input_ids, add default_chat_template to code_llama
      
      * Make sure all LLaMA models are using the latest template
      
      * Remove default_system_prompt block in code_llama because it has no default prompt
      
      * Update ConversationPipeline preprocess
      
      * Add correct #Copied from links to the default_chat_templates
      
      * Remove unneeded type checking line
      
      * Add a dummy mark_processsed method
      
      * Reorganize Conversation to have **deprecated_kwargs
      
      * Update chat_templating.md
      
      * Quick fix to LLAMA tests
      
      * Small doc tweaks
      
      * Add proper docstrings and "copied from" statements to all default chat templates
      
      * Merge use_default_system_prompt support for code_llama too
      
      * Improve clarity around self.chat_template
      
      * Docstring fix
      
      * Fix blenderbot default template
      
      * More doctest fix
      
      * Break out some tokenizer kwargs
      
      * Update doc to explain default templates
      
      * Quick tweaks to tokenizer args
      
      * Cleanups for tokenizer args
      
      * Add note about cacheing
      
      * Quick tweak to the chat-templating doc
      
      * Update the LLaMA template with error checking and correct system message embedding
      
      * make fixup
      
      * make fixup
      
      * add requires_jinja
      
      * Cleanup to expected output formatting
      
      * Add cacheing
      
      * Fix typo in llama default template
      
      * Update LLaMA tests
      
      * Update documentation
      
      * Improved legacy handling in the Conversation class
      
      * Update Jinja template with proper error handling
      
      * Quick bugfix
      
      * Proper exception raising
      
      * Change cacheing behaviour so it doesn't try to pickle an entire Jinja env
      
      * make fixup
      
      * rebase cleanup
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      866df66f
  8. 18 Aug, 2023 1 commit
    • Arthur's avatar
      [`split_special_tokens`] Add support for `split_special_tokens` argument to encode (#25081) · 30b3c46f
      Arthur authored
      * draft changes
      
      * update and add tests
      
      * styling for no
      
      * move test
      
      * path to usable model
      
      * update test
      
      * small update
      
      * update bertbased tokenizers
      
      * don'tuse kwargs for _tokenize
      
      * don'tuse kwargs for _tokenize
      
      * fix copies
      
      * update
      
      * update test for special tokenizers
      
      * fixup
      
      * skip two tests
      
      * remove pdb breakpiont()
      
      * wowo
      
      * rewrite custom tests
      
      * nits
      
      * revert chang in target keys
      
      * fix markup lm
      
      * update documentation of the argument
      30b3c46f
  9. 06 Feb, 2023 1 commit
    • Sylvain Gugger's avatar
      Update quality tooling for formatting (#21480) · 6f79d264
      Sylvain Gugger authored
      * Result of black 23.1
      
      * Update target to Python 3.7
      
      * Switch flake8 to ruff
      
      * Configure isort
      
      * Configure isort
      
      * Apply isort with line limit
      
      * Put the right black version
      
      * adapt black in check copies
      
      * Fix copies
      6f79d264
  10. 12 Jan, 2023 1 commit
  11. 24 Aug, 2022 1 commit
    • SaulLu's avatar
      add warning to let the user know that the `__call__` method is faster than... · 6667b0d7
      SaulLu authored
      add warning to let the user know that the `__call__` method is faster than `encode` + `pad` for a fast tokenizer (#18693)
      
      * add warning to let the user know that the  method is slower that  for a fast tokenizer
      
      * user warnings
      
      * fix layoutlmv2
      
      * fix layout*
      
      * change warnings into logger.warning
      6667b0d7
  12. 11 Jul, 2022 1 commit
  13. 12 May, 2022 1 commit
  14. 03 May, 2022 1 commit
    • Yih-Dar's avatar
      Move test model folders (#17034) · 19420fd9
      Yih-Dar authored
      
      
      * move test model folders (TODO: fix imports and others)
      
      * fix (potentially partially) imports (in model test modules)
      
      * fix (potentially partially) imports (in tokenization test modules)
      
      * fix (potentially partially) imports (in feature extraction test modules)
      
      * fix import utils.test_modeling_tf_core
      
      * fix path ../fixtures/
      
      * fix imports about generation.test_generation_flax_utils
      
      * fix more imports
      
      * fix fixture path
      
      * fix get_test_dir
      
      * update module_to_test_file
      
      * fix get_tests_dir from wrong transformers.utils
      
      * update config.yml (CircleCI)
      
      * fix style
      
      * remove missing imports
      
      * update new model script
      
      * update check_repo
      
      * update SPECIAL_MODULE_TO_TEST_MAP
      
      * fix style
      
      * add __init__
      
      * update self-scheduled
      
      * fix add_new_model scripts
      
      * check one way to get location back
      
      * python setup.py build install
      
      * fix import in test auto
      
      * update self-scheduled.yml
      
      * update slack notification script
      
      * Add comments about artifact names
      
      * fix for yolos
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      19420fd9
  15. 08 Mar, 2022 1 commit
  16. 23 Feb, 2022 1 commit
  17. 03 Jan, 2022 1 commit
  18. 09 Nov, 2021 1 commit
  19. 30 Aug, 2021 1 commit
    • NielsRogge's avatar
      Add LayoutLMv2 + LayoutXLM (#12604) · b6ddb08a
      NielsRogge authored
      
      
      * First commit
      
      * Make style
      
      * Fix dummy objects
      
      * Add Detectron2 config
      
      * Add LayoutLMv2 pooler
      
      * More improvements, add documentation
      
      * More improvements
      
      * Add model tests
      
      * Add clarification regarding image input
      
      * Improve integration test
      
      * Fix bug
      
      * Fix another bug
      
      * Fix another bug
      
      * Fix another bug
      
      * More improvements
      
      * Make more tests pass
      
      * Make more tests pass
      
      * Improve integration test
      
      * Remove gradient checkpointing and add head masking
      
      * Add integration test
      
      * Add LayoutLMv2ForSequenceClassification to the tests
      
      * Add LayoutLMv2ForQuestionAnswering
      
      * More improvements
      
      * More improvements
      
      * Small improvements
      
      * Fix _LazyModule
      
      * Fix fast tokenizer
      
      * Move sync_batch_norm to a separate method
      
      * Replace dummies by requires_backends
      
      * Move calculation of visual bounding boxes to separate method + update README
      
      * Add models to main init
      
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Remove is_split_into_words
      
      * More improvements
      
      * Simply tesseract - no use of pandas anymore
      
      * Add LayoutLMv2Processor
      
      * Update is_pytesseract_available
      
      * Fix bugs
      
      * Improve feature extractor
      
      * Fix bug
      
      * Add print statement
      
      * Add truncation of bounding boxes
      
      * Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer
      
      * Improve tokenizer tests
      
      * Make more tokenizer tests pass
      
      * Make more tests pass, add integration tests
      
      * Finish integration tests
      
      * More improvements
      
      * More improvements - update API of the tokenizer
      
      * More improvements
      
      * Remove support for VQA training
      
      * Remove some files
      
      * Improve feature extractor
      
      * Improve documentation and one more tokenizer test
      
      * Make quality and small docs improvements
      
      * Add batched tests for LayoutLMv2Processor, remove fast tokenizer
      
      * Add truncation of labels
      
      * Apply suggestions from code review
      
      * Improve processor tests
      
      * Fix failing tests and add suggestion from code review
      
      * Fix tokenizer test
      
      * Add detectron2 CI job
      
      * Simplify CI job
      
      * Comment out non-detectron2 jobs and specify number of processes
      
      * Add pip install torchvision
      
      * Add durations to see which tests are slow
      
      * Fix tokenizer test and make model tests smaller
      
      * Frist draft
      
      * Use setattr
      
      * Possible fix
      
      * Proposal with configuration
      
      * First draft of fast tokenizer
      
      * More improvements
      
      * Enable fast tokenizer tests
      
      * Make more tests pass
      
      * Make more tests pass
      
      * More improvements
      
      * Addd padding to fast tokenizer
      
      * Mkae more tests pass
      
      * Make more tests pass
      
      * Make all tests pass for fast tokenizer
      
      * Make fast tokenizer support overflowing boxes and labels
      
      * Add support for overflowing_labels to slow tokenizer
      
      * Add support for fast tokenizer to the processor
      
      * Update processor tests for both slow and fast tokenizers
      
      * Add head models to model mappings
      
      * Make style & quality
      
      * Remove Detectron2 config file
      
      * Add configurable option to label all subwords
      
      * Fix test
      
      * Skip visual segment embeddings in test
      
      * Use ResNet-18 backbone in tests instead of ResNet-101
      
      * Proposal
      
      * Re-enable all jobs on CI
      
      * Fix installation of tesseract
      
      * Fix failing test
      
      * Fix index table
      
      * Add LayoutXLM doc page, first draft of code examples
      
      * Improve documentation a lot
      
      * Update expected boxes for Tesseract 4.0.0 beta
      
      * Use offsets to create labels instead of checking if they start with ##
      
      * Update expected boxes for Tesseract 4.1.1
      
      * Fix conflict
      
      * Make variable names cleaner, add docstring, add link to notebooks
      
      * Revert "Fix conflict"
      
      This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.
      
      * Revert to make integration test pass
      
      * Apply suggestions from @LysandreJik's review
      
      * Address @patrickvonplaten's comments
      
      * Remove fixtures DocVQA in favor of dataset on the hub
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      b6ddb08a