1. 24 May, 2022 1 commit
    • NielsRogge's avatar
      Add LayoutLMv3 (#17060) · 31ee80d5
      NielsRogge authored
      
      
      * Make forward pass work
      
      * More improvements
      
      * Remove unused imports
      
      * Remove timm dependency
      
      * Improve loss calculation of token classifier
      
      * Fix most tests
      
      * Add docs
      
      * Add model integration test
      
      * Make all tests pass
      
      * Add LayoutLMv3FeatureExtractor
      
      * Improve integration test + make fixup
      
      * Add example script
      
      * Fix style
      
      * Add LayoutLMv3Processor
      
      * Fix style
      
      * Add option to add visual labels
      
      * Make more tokenizer tests pass
      
      * Fix more tests
      
      * Make more tests pass
      
      * Fix bug and improve docs
      
      * Fix import of processors
      
      * Improve docstrings
      
      * Fix toctree and improve docs
      
      * Fix auto tokenizer
      
      * Move tests to model folder
      
      * Move tests to model folder
      
      * change default behavior add_prefix_space
      
      * add prefix space for fast
      
      * add_prefix_spcae set to True for Fast
      
      * no space before `unique_no_split` token
      
      * add test to hightligh special treatment of added tokens
      
      * fix `test_batch_encode_dynamic_overflowing` by building a long enough example
      
      * fix `test_full_tokenizer` with add_prefix_token
      
      * Fix tokenizer integration test
      
      * Make the code more readable
      
      * Add tests for LayoutLMv3Processor
      
      * Fix style
      
      * Add model to README and update init
      
      * Apply suggestions from code review
      
      * Replace asserts by value errors
      
      * Add suggestion by @ducviet00
      
      * Add model to doc tests
      
      * Simplify script
      
      * Improve README
      
      * a step ahead to fix
      
      * Update pair_input_test
      
      * Make all tokenizer tests pass - phew
      
      * Make style
      
      * Add LayoutLMv3 to CI job
      
      * Fix auto mapping
      
      * Fix CI job name
      
      * Make all processor tests pass
      
      * Make tests of LayoutLMv2 and LayoutXLM consistent
      
      * Add copied from statements to fast tokenizer
      
      * Add copied from statements to slow tokenizer
      
      * Remove add_visual_labels attribute
      
      * Fix tests
      
      * Add link to notebooks
      
      * Improve docs of LayoutLMv3Processor
      
      * Fix reference to section
      Co-authored-by: default avatarSaulLu <lucilesaul.com@gmail.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      31ee80d5
  2. 23 May, 2022 2 commits
  3. 19 May, 2022 5 commits
  4. 18 May, 2022 5 commits
  5. 17 May, 2022 7 commits
  6. 16 May, 2022 7 commits
  7. 13 May, 2022 3 commits
  8. 12 May, 2022 4 commits
  9. 11 May, 2022 5 commits
  10. 10 May, 2022 1 commit
    • Leon Derczynski's avatar
      MobileBERT tokenizer tests (#16896) · 4a419d49
      Leon Derczynski authored
      
      
      * unhardcode pretrained model path, make it a class var
      
      * add tests for mobilebert tokenizer
      
      * allow tempfiles for vocab & merge similarity test to autodelete
      
      * add explanatory comments
      
      * remove unused imports, let make style do its.. thing
      
      * remove inheritance and use BERT tok tests for MobileBERT
      
      * Update tests/mobilebert/test_tokenization_mobilebert.py
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      
      * amend class names, remove unused import, add fix for mobilebert's hub pathname
      
      * unhardcode pretrained model path, make it a class var
      
      * add tests for mobilebert tokenizer
      
      * allow tempfiles for vocab & merge similarity test to autodelete
      
      * add explanatory comments
      
      * remove unused imports, let make style do its.. thing
      
      * remove inheritance and use BERT tok tests for MobileBERT
      
      * Update tests/mobilebert/test_tokenization_mobilebert.py
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      
      * amend class names, remove unused import, add fix for mobilebert's hub pathname
      
      * amend paths for model tests being in models/ subdir of /tests
      
      * explicitly rm test from prev path
      Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
      4a419d49