1. 27 Jul, 2022 1 commit
  2. 11 Jul, 2022 2 commits
  3. 29 Jun, 2022 1 commit
  4. 22 Jun, 2022 2 commits
  5. 21 Jun, 2022 1 commit
  6. 17 Jun, 2022 2 commits
  7. 14 Jun, 2022 1 commit
  8. 10 Jun, 2022 3 commits
  9. 24 May, 2022 2 commits
    • dependabot[bot]'s avatar
      1ef9a1ed
    • NielsRogge's avatar
      Add LayoutLMv3 (#17060) · 31ee80d5
      NielsRogge authored
      
      
      * Make forward pass work
      
      * More improvements
      
      * Remove unused imports
      
      * Remove timm dependency
      
      * Improve loss calculation of token classifier
      
      * Fix most tests
      
      * Add docs
      
      * Add model integration test
      
      * Make all tests pass
      
      * Add LayoutLMv3FeatureExtractor
      
      * Improve integration test + make fixup
      
      * Add example script
      
      * Fix style
      
      * Add LayoutLMv3Processor
      
      * Fix style
      
      * Add option to add visual labels
      
      * Make more tokenizer tests pass
      
      * Fix more tests
      
      * Make more tests pass
      
      * Fix bug and improve docs
      
      * Fix import of processors
      
      * Improve docstrings
      
      * Fix toctree and improve docs
      
      * Fix auto tokenizer
      
      * Move tests to model folder
      
      * Move tests to model folder
      
      * change default behavior add_prefix_space
      
      * add prefix space for fast
      
      * add_prefix_spcae set to True for Fast
      
      * no space before `unique_no_split` token
      
      * add test to hightligh special treatment of added tokens
      
      * fix `test_batch_encode_dynamic_overflowing` by building a long enough example
      
      * fix `test_full_tokenizer` with add_prefix_token
      
      * Fix tokenizer integration test
      
      * Make the code more readable
      
      * Add tests for LayoutLMv3Processor
      
      * Fix style
      
      * Add model to README and update init
      
      * Apply suggestions from code review
      
      * Replace asserts by value errors
      
      * Add suggestion by @ducviet00
      
      * Add model to doc tests
      
      * Simplify script
      
      * Improve README
      
      * a step ahead to fix
      
      * Update pair_input_test
      
      * Make all tokenizer tests pass - phew
      
      * Make style
      
      * Add LayoutLMv3 to CI job
      
      * Fix auto mapping
      
      * Fix CI job name
      
      * Make all processor tests pass
      
      * Make tests of LayoutLMv2 and LayoutXLM consistent
      
      * Add copied from statements to fast tokenizer
      
      * Add copied from statements to slow tokenizer
      
      * Remove add_visual_labels attribute
      
      * Fix tests
      
      * Add link to notebooks
      
      * Improve docs of LayoutLMv3Processor
      
      * Fix reference to section
      Co-authored-by: default avatarSaulLu <lucilesaul.com@gmail.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      31ee80d5
  10. 23 May, 2022 1 commit
  11. 19 May, 2022 1 commit
  12. 18 May, 2022 2 commits
  13. 16 May, 2022 3 commits
  14. 12 May, 2022 1 commit
  15. 09 May, 2022 1 commit
  16. 04 May, 2022 3 commits
  17. 03 May, 2022 1 commit
  18. 28 Apr, 2022 1 commit
  19. 27 Apr, 2022 1 commit
  20. 25 Apr, 2022 2 commits
  21. 21 Apr, 2022 1 commit
  22. 19 Apr, 2022 1 commit
    • Suraj Patil's avatar
      [Flax] improve large model init and loading (#16148) · d3bd9ac7
      Suraj Patil authored
      
      
      * begin do_init
      
      * add params_shape_tree
      
      * raise error if params are accessed when do_init is False
      
      * don't allow do_init=False when keys are missing
      
      * make shape tree a property
      
      * assign self._params at the end
      
      * add test for do_init
      
      * add do_init arg to all flax models
      
      * fix param setting
      
      * disbale do_init for composite models
      
      * update test
      
      * add do_init in FlaxBigBirdForMultipleChoice
      
      * better names and errors
      
      * improve test
      
      * style
      
      * add a warning when do_init=False
      
      * remove extra if
      
      * set params after _required_params
      
      * add test for from_pretrained
      
      * do_init => _do_init
      
      * chage warning to info
      
      * fix typo
      
      * add params in init_weights
      
      * add params to gpt neo init
      
      * add params to init_weights
      
      * update do_init test
      
      * Trigger CI
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * update template
      
      * trigger CI
      
      * style
      
      * style
      
      * fix template
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d3bd9ac7
  23. 13 Apr, 2022 1 commit
    • Tu Vu's avatar
      Add self training code for text classification (#16738) · 34ef029d
      Tu Vu authored
      * Add self-training code for text-classification
      
      * Add self-training code for text-classification
      
      * Add self-training code for text-classification
      
      * Add self-training code for text-classification
      
      * Add self-training code for text-classification
      
      * Delete strata
      34ef029d
  24. 12 Apr, 2022 1 commit
  25. 11 Apr, 2022 2 commits
    • Zachary Mueller's avatar
      Fix example logs repeating themselves (#16669) · 69233cf0
      Zachary Mueller authored
      Move declaration of log streams to before tests, so that results won't get compounded on top of each other
      69233cf0
    • Jia LI's avatar
      Jia multi gpu eval (#16428) · 4868a830
      Jia LI authored
      
      
      * add simple multi gpu complet
      
      * add human_eval_multi_gpu
      
      * use copy strategy to distribute across gpu, to avoid padding
      
      * add doc string
      
      * update code style
      
      * use task id to arrange output
      
      * truncate input to avoid zero pad
      
      * Stop the copy mechanism
      
      * update style
      
      * restore copies to scale better in distributed mode
      
      * update style
      
      * replace human eval
      
      * Apply suggestions from code review
      
      1. Tokenize all input at the same time
      2. use attention_mask to get the input length
      3. other small fixes
      Co-authored-by: default avatarLeandro von Werra <lvwerra@users.noreply.github.com>
      
      * correct typo and update docstring
      
      * update code style
      
      * remove num sample division constraint
      
      * remove max len calculation
      
      * use accelerator.gather once to speed up
      
      * use accelerate set_seed; update accelerate version
      
      * correct gather bug
      Co-authored-by: default avatarLeandro von Werra <lvwerra@users.noreply.github.com>
      4868a830
  26. 08 Apr, 2022 1 commit
    • NielsRogge's avatar
      Add TAPEX (#16473) · 4ef0abb7
      NielsRogge authored
      
      
      * Add TapexTokenizer
      
      * Improve docstrings and provide option to provide answer
      
      * Remove option for pretokenized inputs
      
      * Add TAPEX to README
      
      * Fix copies
      
      * Remove option for pretokenized inputs
      
      * Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.
      
      * - Draft a README file for running the script and introducing some background.
      - Remove unused code lines in tabfact script.
      - Disable the deafult `pad_to_max_length` option which is memory-consuming.
      
      * * Support `as_target_tokenizer` function for TapexTokenizer.
      * Fix the do_lower_case behaviour of TapexTokenizer.
      * Add unit tests for target scenarios and cased/uncased scenarios for both source and target.
      
      * * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
      * Fix typos in tapex example README.
      
      * * fix the evaluation script - remove the property `task_name`
      
      * * Make the label space more clear for tabfact tasks
      
      * * Using a new fine-tuning script for tapex-base on tabfact.
      
      * * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
      * Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql
      
      * * Remove the default tokenizer_name option.
      * Provide evaluation command.
      
      * * Support for WikiTableQuestion dataset.
      
      * Fix a typo in README.
      
      * * Fix the datasets's key name in WikiTableQuestions
      
      * Run make fixup and move test to folder
      
      * Fix quality
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Apply some more suggestions from code review
      
      * Improve docstrings
      
      * Overwrite failing test
      
      * Improve comment in example scripts
      
      * Fix rebase
      
      * Add TAPEX to Auto mapping
      
      * Add TAPEX to auto config mappings
      
      * Put TAPEX higher than BART in auto mapping
      
      * Add TAPEX to doc tests
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MBP.localdomain>
      Co-authored-by: default avatarSivilTaram <qianlxc@outlook.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@nielss-mbp.home>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      4ef0abb7
  27. 31 Mar, 2022 1 commit