1. 08 Apr, 2022 1 commit
    • NielsRogge's avatar
      Add TAPEX (#16473) · 4ef0abb7
      NielsRogge authored
      
      
      * Add TapexTokenizer
      
      * Improve docstrings and provide option to provide answer
      
      * Remove option for pretokenized inputs
      
      * Add TAPEX to README
      
      * Fix copies
      
      * Remove option for pretokenized inputs
      
      * Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.
      
      * - Draft a README file for running the script and introducing some background.
      - Remove unused code lines in tabfact script.
      - Disable the deafult `pad_to_max_length` option which is memory-consuming.
      
      * * Support `as_target_tokenizer` function for TapexTokenizer.
      * Fix the do_lower_case behaviour of TapexTokenizer.
      * Add unit tests for target scenarios and cased/uncased scenarios for both source and target.
      
      * * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
      * Fix typos in tapex example README.
      
      * * fix the evaluation script - remove the property `task_name`
      
      * * Make the label space more clear for tabfact tasks
      
      * * Using a new fine-tuning script for tapex-base on tabfact.
      
      * * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
      * Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql
      
      * * Remove the default tokenizer_name option.
      * Provide evaluation command.
      
      * * Support for WikiTableQuestion dataset.
      
      * Fix a typo in README.
      
      * * Fix the datasets's key name in WikiTableQuestions
      
      * Run make fixup and move test to folder
      
      * Fix quality
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Apply some more suggestions from code review
      
      * Improve docstrings
      
      * Overwrite failing test
      
      * Improve comment in example scripts
      
      * Fix rebase
      
      * Add TAPEX to Auto mapping
      
      * Add TAPEX to auto config mappings
      
      * Put TAPEX higher than BART in auto mapping
      
      * Add TAPEX to doc tests
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MBP.localdomain>
      Co-authored-by: default avatarSivilTaram <qianlxc@outlook.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@nielss-mbp.home>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      4ef0abb7
  2. 23 Feb, 2022 1 commit
  3. 03 Jan, 2022 1 commit
  4. 23 Aug, 2021 1 commit
  5. 31 Mar, 2021 1 commit
  6. 13 Jan, 2021 1 commit
    • Lysandre Debut's avatar
      Fix slow tests v4.2.0 (#9561) · c9495166
      Lysandre Debut authored
      * Fix conversational pipeline test
      
      * LayoutLM
      
      * ProphetNet
      
      * BART
      
      * Blenderbot & small
      
      * Marian
      
      * mBART
      
      * Pegasus
      
      * Tapas tokenizer
      
      * BERT2BERT test
      
      * Style
      
      * Example requirements
      
      * TF BERT2BERT test
      c9495166
  7. 12 Jan, 2021 1 commit
    • Sylvain Gugger's avatar
      Refactor `prepare_seq2seq_batch` (#9524) · 063d8d27
      Sylvain Gugger authored
      * Add target contextmanager and rework prepare_seq2seq_batch
      
      * Fix tests, treat BART and Barthez
      
      * Add last tokenizers
      
      * Fix test
      
      * Set src token before calling the superclass
      
      * Remove special behavior for T5
      
      * Remove needless imports
      
      * Remove needless asserts
      063d8d27
  8. 11 Jan, 2021 1 commit
  9. 15 Dec, 2020 1 commit
    • NielsRogge's avatar
      [WIP] Tapas v4 (tres) (#9117) · 1551e2dc
      NielsRogge authored
      
      
      * First commit: adding all files from tapas_v3
      
      * Fix multiple bugs including soft dependency and new structure of the library
      
      * Improve testing by adding torch_device to inputs and adding dependency on scatter
      
      * Use Python 3 inheritance rather than Python 2
      
      * First draft model cards of base sized models
      
      * Remove model cards as they are already on the hub
      
      * Fix multiple bugs with integration tests
      
      * All model integration tests pass
      
      * Remove print statement
      
      * Add test for convert_logits_to_predictions method of TapasTokenizer
      
      * Incorporate suggestions by Google authors
      
      * Fix remaining tests
      
      * Change position embeddings sizes to 512 instead of 1024
      
      * Comment out positional embedding sizes
      
      * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
      
      * Added more model names
      
      * Fix truncation when no max length is specified
      
      * Disable torchscript test
      
      * Make style & make quality
      
      * Quality
      
      * Address CI needs
      
      * Test the Masked LM model
      
      * Fix the masked LM model
      
      * Truncate when overflowing
      
      * More much needed docs improvements
      
      * Fix some URLs
      
      * Some more docs improvements
      
      * Test PyTorch scatter
      
      * Set to slow + minify
      
      * Calm flake8 down
      
      * First commit: adding all files from tapas_v3
      
      * Fix multiple bugs including soft dependency and new structure of the library
      
      * Improve testing by adding torch_device to inputs and adding dependency on scatter
      
      * Use Python 3 inheritance rather than Python 2
      
      * First draft model cards of base sized models
      
      * Remove model cards as they are already on the hub
      
      * Fix multiple bugs with integration tests
      
      * All model integration tests pass
      
      * Remove print statement
      
      * Add test for convert_logits_to_predictions method of TapasTokenizer
      
      * Incorporate suggestions by Google authors
      
      * Fix remaining tests
      
      * Change position embeddings sizes to 512 instead of 1024
      
      * Comment out positional embedding sizes
      
      * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
      
      * Added more model names
      
      * Fix truncation when no max length is specified
      
      * Disable torchscript test
      
      * Make style & make quality
      
      * Quality
      
      * Address CI needs
      
      * Test the Masked LM model
      
      * Fix the masked LM model
      
      * Truncate when overflowing
      
      * More much needed docs improvements
      
      * Fix some URLs
      
      * Some more docs improvements
      
      * Add add_pooling_layer argument to TapasModel
      
      Fix comments by @sgugger and @patrickvonplaten
      
      * Fix issue in docs + fix style and quality
      
      * Clean up conversion script and add task parameter to TapasConfig
      
      * Revert the task parameter of TapasConfig
      
      Some minor fixes
      
      * Improve conversion script and add test for absolute position embeddings
      
      * Improve conversion script and add test for absolute position embeddings
      
      * Fix bug with reset_position_index_per_cell arg of the conversion cli
      
      * Add notebooks to the examples directory and fix style and quality
      
      * Apply suggestions from code review
      
      * Move from `nielsr/` to `google/` namespace
      
      * Apply Sylvain's comments
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarRogge Niels <niels.rogge@howest.be>
      Co-authored-by: default avatarLysandreJik <lysandre.debut@reseau.eseo.fr>
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      1551e2dc