- 08 Apr, 2022 1 commit
-
-
NielsRogge authored
* Add TapexTokenizer * Improve docstrings and provide option to provide answer * Remove option for pretokenized inputs * Add TAPEX to README * Fix copies * Remove option for pretokenized inputs * Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification. * - Draft a README file for running the script and introducing some background. - Remove unused code lines in tabfact script. - Disable the deafult `pad_to_max_length` option which is memory-consuming. * * Support `as_target_tokenizer` function for TapexTokenizer. * Fix the do_lower_case behaviour of TapexTokenizer. * Add unit tests for target scenarios and cased/uncased scenarios for both source and target. * * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function. * Fix typos in tapex example README. * * fix the evaluation script - remove the property `task_name` * * Make the label space more clear for tabfact tasks * * Using a new fine-tuning script for tapex-base on tabfact. * * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case * Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql * * Remove the default tokenizer_name option. * Provide evaluation command. * * Support for WikiTableQuestion dataset. * Fix a typo in README. * * Fix the datasets's key name in WikiTableQuestions * Run make fixup and move test to folder * Fix quality * Apply suggestions from code review * Apply suggestions from code review Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply some more suggestions from code review * Improve docstrings * Overwrite failing test * Improve comment in example scripts * Fix rebase * Add TAPEX to Auto mapping * Add TAPEX to auto config mappings * Put TAPEX higher than BART in auto mapping * Add TAPEX to doc tests Co-authored-by:
Niels Rogge <nielsrogge@Nielss-MBP.localdomain> Co-authored-by:
SivilTaram <qianlxc@outlook.com> Co-authored-by:
Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by:
Suraj Patil <surajp815@gmail.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
-
- 23 Feb, 2022 1 commit
-
-
Lysandre Debut authored
* Per-folder tests reorganization Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
- 03 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Enabling `truncation_side` for Slow and Fast tokenizer. Co-Authored-by:
Niels Rogge <48327001+NielsRogge@users.noreply.github.com> * Disable failing tests. * Layout xlm. * assert -> assertEqual. Co-authored-by:
Niels Rogge <48327001+NielsRogge@users.noreply.github.com>
-
- 23 Aug, 2021 1 commit
-
-
NielsRogge authored
* Add min and max question length option to the tokenizer * Add corresponding test
-
- 31 Mar, 2021 1 commit
-
-
Sylvain Gugger authored
* First third * Styling and fix mistake * Quality * All the rest * Treat %s and %d * typo * Missing ) * Apply suggestions from code review Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 13 Jan, 2021 1 commit
-
-
Lysandre Debut authored
* Fix conversational pipeline test * LayoutLM * ProphetNet * BART * Blenderbot & small * Marian * mBART * Pegasus * Tapas tokenizer * BERT2BERT test * Style * Example requirements * TF BERT2BERT test
-
- 12 Jan, 2021 1 commit
-
-
Sylvain Gugger authored
* Add target contextmanager and rework prepare_seq2seq_batch * Fix tests, treat BART and Barthez * Add last tokenizers * Fix test * Set src token before calling the superclass * Remove special behavior for T5 * Remove needless imports * Remove needless asserts
-
- 11 Jan, 2021 1 commit
-
-
Lysandre Debut authored
* Remove tolerance + drop_rows_to_fit by default * remove drop_rows_to_fit
-
- 15 Dec, 2020 1 commit
-
-
NielsRogge authored
* First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Test PyTorch scatter * Set to slow + minify * Calm flake8 down * First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Add add_pooling_layer argument to TapasModel Fix comments by @sgugger and @patrickvonplaten * Fix issue in docs + fix style and quality * Clean up conversion script and add task parameter to TapasConfig * Revert the task parameter of TapasConfig Some minor fixes * Improve conversion script and add test for absolute position embeddings * Improve conversion script and add test for absolute position embeddings * Fix bug with reset_position_index_per_cell arg of the conversion cli * Add notebooks to the examples directory and fix style and quality * Apply suggestions from code review * Move from `nielsr/` to `google/` namespace * Apply Sylvain's comments Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Rogge Niels <niels.rogge@howest.be> Co-authored-by:
LysandreJik <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
sgugger <sylvain.gugger@gmail.com>
-