• NielsRogge's avatar
    Add TAPEX (#16473) · 4ef0abb7
    NielsRogge authored
    
    
    * Add TapexTokenizer
    
    * Improve docstrings and provide option to provide answer
    
    * Remove option for pretokenized inputs
    
    * Add TAPEX to README
    
    * Fix copies
    
    * Remove option for pretokenized inputs
    
    * Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.
    
    * - Draft a README file for running the script and introducing some background.
    - Remove unused code lines in tabfact script.
    - Disable the deafult `pad_to_max_length` option which is memory-consuming.
    
    * * Support `as_target_tokenizer` function for TapexTokenizer.
    * Fix the do_lower_case behaviour of TapexTokenizer.
    * Add unit tests for target scenarios and cased/uncased scenarios for both source and target.
    
    * * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
    * Fix typos in tapex example README.
    
    * * fix the evaluation script - remove the property `task_name`
    
    * * Make the label space more clear for tabfact tasks
    
    * * Using a new fine-tuning script for tapex-base on tabfact.
    
    * * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
    * Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql
    
    * * Remove the default tokenizer_name option.
    * Provide evaluation command.
    
    * * Support for WikiTableQuestion dataset.
    
    * Fix a typo in README.
    
    * * Fix the datasets's key name in WikiTableQuestions
    
    * Run make fixup and move test to folder
    
    * Fix quality
    
    * Apply suggestions from code review
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
    
    * Apply suggestions from code review
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Apply some more suggestions from code review
    
    * Improve docstrings
    
    * Overwrite failing test
    
    * Improve comment in example scripts
    
    * Fix rebase
    
    * Add TAPEX to Auto mapping
    
    * Add TAPEX to auto config mappings
    
    * Put TAPEX higher than BART in auto mapping
    
    * Add TAPEX to doc tests
    Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MBP.localdomain>
    Co-authored-by: default avatarSivilTaram <qianlxc@outlook.com>
    Co-authored-by: default avatarNiels Rogge <nielsrogge@nielss-mbp.home>
    Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
    4ef0abb7
test_tokenization_tapex.py 44.2 KB