[WIP] Tapas v4 (tres) (#9117)

* First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Test PyTorch scatter * Set to slow + minify * Calm flake8 down * First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Add add_pooling_layer argument to TapasModel Fix comments by @sgugger and @patrickvonplaten * Fix issue in docs + fix style and quality * Clean up conversion script and add task parameter to TapasConfig * Revert the task parameter of TapasConfig Some minor fixes * Improve conversion script and add test for absolute position embeddings * Improve conversion script and add test for absolute position embeddings * Fix bug with reset_position_index_per_cell arg of the conversion cli * Add notebooks to the examples directory and fix style and quality * Apply suggestions from code review * Move from `nielsr/` to `google/` namespace * Apply Sylvain's comments Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Rogge Niels <niels.rogge@howest.be> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: sgugger <sylvain.gugger@gmail.com>

[WIP] Tapas v4 (tres) (#9117)
* First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Test PyTorch scatter * Set to slow + minify * Calm flake8 down * First commit: adding all files from tapas_v3 * Fix multiple bugs including soft dependency and new structure of the library * Improve testing by adding torch_device to inputs and adding dependency on scatter * Use Python 3 inheritance rather than Python 2 * First draft model cards of base sized models * Remove model cards as they are already on the hub * Fix multiple bugs with integration tests * All model integration tests pass * Remove print statement * Add test for convert_logits_to_predictions method of TapasTokenizer * Incorporate suggestions by Google authors * Fix remaining tests * Change position embeddings sizes to 512 instead of 1024 * Comment out positional embedding sizes * Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES * Added more model names * Fix truncation when no max length is specified * Disable torchscript test * Make style & make quality * Quality * Address CI needs * Test the Masked LM model * Fix the masked LM model * Truncate when overflowing * More much needed docs improvements * Fix some URLs * Some more docs improvements * Add add_pooling_layer argument to TapasModel Fix comments by @sgugger and @patrickvonplaten * Fix issue in docs + fix style and quality * Clean up conversion script and add task parameter to TapasConfig * Revert the task parameter of TapasConfig Some minor fixes * Improve conversion script and add test for absolute position embeddings * Improve conversion script and add test for absolute position embeddings * Fix bug with reset_position_index_per_cell arg of the conversion cli * Add notebooks to the examples directory and fix style and quality * Apply suggestions from code review * Move from `nielsr/` to `google/` namespace * Apply Sylvain's comments Co-authored-by: sgugger <sylvain.gugger@gmail.com> Co-authored-by: Rogge Niels <niels.rogge@howest.be> Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: sgugger <sylvain.gugger@gmail.com>
1551e2dc · NielsRogge · GitHub · ad895af9 · 1551e2dc · 1551e2dc
Unverified Commit 1551e2dc authored Dec 15, 2020 by NielsRogge Committed by GitHub Dec 15, 2020
20 changed files
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -79,6 +79,7 @@ jobs:
                      - v0.4-{{ checksum "setup.py" }}
            - run: pip install --upgrade pip
            - run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece]
+            - run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
            - save_cache:
                key: v0.4-{{ checksum "setup.py" }}
                paths:
@@ -105,6 +106,7 @@ jobs:
                      - v0.4-{{ checksum "setup.py" }}
            - run: pip install --upgrade pip
            - run: pip install .[sklearn,torch,testing,sentencepiece]
+            - run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
            - save_cache:
                  key: v0.4-torch-{{ checksum "setup.py" }}
                  paths:
@@ -183,6 +185,7 @@ jobs:
                      - v0.4-{{ checksum "setup.py" }}
            - run: pip install --upgrade pip
            - run: pip install .[sklearn,torch,testing,sentencepiece]
+            - run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
            - save_cache:
                  key: v0.4-torch-{{ checksum "setup.py" }}
                  paths:

--- a/.github/workflows/self-push.yml
+++ b/.github/workflows/self-push.yml
@@ -50,6 +50,7 @@ jobs:
          pip install --upgrade pip
          pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
          pip install git+https://github.com/huggingface/datasets
+          pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-$(python -c "import torch; print(''.join(torch.__version__)")+$(python -c "import torch; print(''.join(torch.version.cuda.split('.')))").html
      - name: Are GPUs recognized by our DL frameworks
        run: |
@@ -187,6 +188,7 @@ jobs:
          pip install --upgrade pip
          pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
          pip install git+https://github.com/huggingface/datasets
+          pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-$(python -c "import torch; print(''.join(torch.__version__)")+$(python -c "import torch; print(''.join(torch.version.cuda.split('.')))").html
      - name: Are GPUs recognized by our DL frameworks
        run: |

--- a/README.md
+++ b/README.md
@@ -222,6 +222,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
 ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
 1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
 1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
+1. **[TAPAS](https://huggingface.co/transformers/master/model_doc/tapas.html)** released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
 1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
 1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
 1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -176,19 +176,22 @@ and conversion utilities for the following models:
 30. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
-31. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
+31. `TAPAS <https://huggingface.co/transformers/master/model_doc/tapas.html>`__ released with the paper `TAPAS: Weakly
+    Supervised Table Parsing via Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof
+    Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
+32. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
-32. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
+33. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
-33. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
+34. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
-34. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
+35. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
-35. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
+36. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
@@ -269,6 +272,8 @@ TensorFlow and/or Flax.
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
+|            TAPAS            |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |       Transformer-XL        |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |             XLM             |       ✅       |       ❌       |       ✅        |         ✅         |      ❌      |
@@ -382,6 +387,7 @@ TensorFlow and/or Flax.
    model_doc/roberta
    model_doc/squeezebert
    model_doc/t5
+    model_doc/tapas
    model_doc/transformerxl
    model_doc/xlm
    model_doc/xlmprophetnet

--- a/docs/source/model_doc/tapas.rst
+++ b/docs/source/model_doc/tapas.rst
--- a/model_cards/google/tapas-base/README.md
+++ b/model_cards/google/tapas-base/README.md
+---
+language: en
+tags:
+- tapas
+- masked-lm
+license: apache-2.0
+---
+# TAPAS base model 
+This model corresponds to the `tapas_inter_masklm_base_reset` checkpoint of the [original Github repository](https://github.com/google-research/tapas). 
+Disclaimer: The team releasing TAPAS did not write a model card for this model so this model card has been written by
+the Hugging Face team and contributors.
+## Model description
+TAPAS is a BERT-like transformers model pretrained on a large corpus of English data from Wikipedia in a self-supervised fashion. 
+This means it was pretrained on the raw tables and associated texts only, with no humans labelling them in any way (which is why it
+can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
+was pretrained with two objectives:
+- Masked language modeling (MLM): taking a (flattened) table and associated context, the model randomly masks 15% of the words in 
+  the input, then runs the entire (partially masked) sequence through the model. The model then has to predict the masked words. 
+  This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, 
+  or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional 
+  representation of a table and associated text.
+- Intermediate pre-training: to encourage numerical reasoning on tables, the authors additionally pre-trained the model by creating 
+  a balanced dataset of millions of syntactically created training examples. Here, the model must predict (classify) whether a sentence 
+  is supported or refuted by the contents of a table. The training examples are created based on synthetic as well as counterfactual statements.
+This way, the model learns an inner representation of the English language used in tables and associated texts, which can then be used 
+to extract features useful for downstream tasks such as answering questions about a table, or determining whether a sentence is entailed
+or refuted by the contents of a table. Fine-tuning is done by adding classification heads on top of the pre-trained model, and then jointly
+train the randomly initialized classification heads with the base model on a labelled dataset. 
+## Intended uses & limitations
+You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. 
+See the [model hub](https://huggingface.co/models?filter=tapas) to look for fine-tuned versions on a task that interests you.
+Here is how to use this model to get the features of a given table-text pair in PyTorch:
+```python
+from transformers import TapasTokenizer, TapasModel
+import pandas as pd
+tokenizer = TapasTokenizer.from_pretrained('tapase-base')
+model = TapasModel.from_pretrained("tapas-base")
+data = {'Actors': ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"],
+         'Age': ["56", "45", "59"],
+         'Number of movies': ["87", "53", "69"]
+}
+table = pd.DataFrame.from_dict(data)
+queries = ["How many movies has George Clooney played in?"]
+text = "Replace me by any text you'd like."
+encoded_input = tokenizer(table=table, queries=queries, return_tensors='pt')
+output = model(**encoded_input)
+```
+## Training data
+For masked language modeling (MLM), a collection of 6.2 million tables was extracted from English Wikipedia: 3.3M of class [Infobox](https://en.wikipedia.org/wiki/Help:Infobox)
+and 2.9M of class WikiTable. The author only considered tables with at most 500 cells. As a proxy for questions that appear in the 
+downstream tasks, the authros extracted the table caption, article title, article description, segment title and text of the segment 
+the table occurs in as relevant text snippets. In this way, 21.3M snippets were created. For more info, see the original [TAPAS paper](https://www.aclweb.org/anthology/2020.acl-main.398.pdf).
+For intermediate pre-training, 2 tasks are introduced: one based on synthetic and the other from counterfactual statements. The first one 
+generates a sentence by sampling from a set of logical expressions that filter, combine and compare the information on the table, which is 
+required in table entailment (e.g., knowing that Gerald Ford is taller than the average president requires summing
+all presidents and dividing by the number of presidents). The second one corrupts sentences about tables appearing on Wikipedia by swapping 
+entities for plausible alternatives. Examples of the two tasks can be seen in Figure 1. The procedure is described in detail in section 3 of 
+the [TAPAS follow-up paper](https://www.aclweb.org/anthology/2020.findings-emnlp.27.pdf).
+## Training procedure
+### Preprocessing
+The texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are
+then of the form:
+```
+[CLS] Context [SEP] Flattened table [SEP]
+```
+The details of the masking procedure for each sequence are the following:
+- 15% of the tokens are masked.
+- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
+- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
+- In the 10% remaining cases, the masked tokens are left as is.
+The details of the creation of the synthetic and counterfactual examples can be found in the [follow-up paper](https://arxiv.org/abs/2010.00571). 
+### Pretraining
+The model was trained on 32 Cloud TPU v3 cores for one million steps with maximum sequence length 512 and batch size of 512.
+In this setup, pre-training takes around 3 days. The optimizer used is Adam with a learning rate of 5e-5, and a warmup ratio 
+of 0.10. 
+### BibTeX entry and citation info
+```bibtex
+@misc{herzig2020tapas,
+      title={TAPAS: Weakly Supervised Table Parsing via Pre-training}, 
+      author={Jonathan Herzig and Paweł Krzysztof Nowak and Thomas Müller and Francesco Piccinno and Julian Martin Eisenschlos},
+      year={2020},
+      eprint={2004.02349},
+      archivePrefix={arXiv},
+      primaryClass={cs.IR}
+}
+```
+```bibtex
+@misc{eisenschlos2020understanding,
+      title={Understanding tables with intermediate pre-training}, 
+      author={Julian Martin Eisenschlos and Syrine Krichene and Thomas Müller},
+      year={2020},
+      eprint={2010.00571},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
\ No newline at end of file
--- a/notebooks/README.md
+++ b/notebooks/README.md
@@ -69,3 +69,5 @@ Pull Request so it can be included under the Community notebooks.
 |[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)|
 |[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
 |[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)|
+|[Fine-tuning TAPAS on Sequential Question Answering (SQA)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb) | How to fine-tune *TapasForQuestionAnswering* with a *tapas-base* checkpoint on the Sequential Question Answering (SQA) dataset | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Fine_tuning_TapasForQuestionAnswering_on_SQAipynb)|
+|[Evaluating TAPAS on Table Fact Checking (TabFact)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb) | How to evaluate a fine-tuned *TapasForSequenceClassification* with a *tapas-base-finetuned-tabfact* checkpoint using a combination of the 🤗 datasets and 🤗 transformers libraries | [Niels Rogge](https://github.com/nielsrogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb)|
--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -164,6 +164,7 @@ from .models.retribert import RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, RetriBert
 from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer
 from .models.squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig, SqueezeBertTokenizer
 from .models.t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
+from .models.tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig, TapasTokenizer
 from .models.transfo_xl import (
    TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP,
    TransfoXLConfig,
@@ -605,6 +606,13 @@ if is_torch_available():
        T5PreTrainedModel,
        load_tf_weights_in_t5,
    )
+    from .models.tapas import (
+        TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST,
+        TapasForMaskedLM,
+        TapasForQuestionAnswering,
+        TapasForSequenceClassification,
+        TapasModel,
+    )
    from .models.transfo_xl import (
        TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST,
        AdaptiveEmbedding,

--- a/src/transformers/file_utils.py
+++ b/src/transformers/file_utils.py
@@ -216,6 +216,29 @@ except ImportError:
    _tokenizers_available = False
+try:
+    import pandas  # noqa: F401
+    _pandas_available = True
+except ImportError:
+    _pandas_available = False
+try:
+    import torch_scatter
+    # Check we're not importing a "torch_scatter" directory somewhere
+    _scatter_available = hasattr(torch_scatter, "__version__") and hasattr(torch_scatter, "scatter")
+    if _scatter_available:
+        logger.debug(f"Succesfully imported torch-scatter version {torch_scatter.__version__}")
+    else:
+        logger.debug("Imported a torch_scatter object but this doesn't seem to be the torch-scatter library.")
+except ImportError:
+    _scatter_available = False
 old_default_cache_path = os.path.join(torch_cache_home, "transformers")
 # New default cache, shared with the Datasets library
 hf_cache_home = os.path.expanduser(
@@ -325,6 +348,14 @@ def is_in_notebook():
    return _in_notebook
+def is_scatter_available():
+    return _scatter_available
+def is_pandas_available():
+    return _pandas_available
 def torch_only_method(fn):
    def wrapper(*args, **kwargs):
        if not _torch_available:
@@ -427,6 +458,13 @@ installation page: https://github.com/google/flax and follow the ones that match
 """
+# docstyle-ignore
+SCATTER_IMPORT_ERROR = """
+{0} requires the torch-scatter library but it was not found in your environment. You can install it with pip as
+explained here: https://github.com/rusty1s/pytorch_scatter.
+"""
 def requires_datasets(obj):
    name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
    if not is_datasets_available():
@@ -481,6 +519,12 @@ def requires_protobuf(obj):
        raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
+def requires_scatter(obj):
+    name = obj.__name__ if hasattr(obj, "__name__") else obj.__class__.__name__
+    if not is_scatter_available():
+        raise ImportError(SCATTER_IMPORT_ERROR.format(name))
 def add_start_docstrings(*docstr):
    def docstring_decorator(fn):
        fn.__doc__ = "".join(docstr) + (fn.__doc__ if fn.__doc__ is not None else "")

--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -51,6 +51,7 @@ from ..retribert.configuration_retribert import RETRIBERT_PRETRAINED_CONFIG_ARCH
 from ..roberta.configuration_roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig
 from ..squeezebert.configuration_squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig
 from ..t5.configuration_t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
+from ..tapas.configuration_tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig
 from ..transfo_xl.configuration_transfo_xl import TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP, TransfoXLConfig
 from ..xlm.configuration_xlm import XLM_PRETRAINED_CONFIG_ARCHIVE_MAP, XLMConfig
 from ..xlm_prophetnet.configuration_xlm_prophetnet import (
@@ -95,6 +96,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
        XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
        PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
        MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
+        TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP,
    ]
    for key, value, in pretrained_map.items()
 )
@@ -141,6 +143,7 @@ CONFIG_MAPPING = OrderedDict(
        ("dpr", DPRConfig),
        ("layoutlm", LayoutLMConfig),
        ("rag", RagConfig),
+        ("tapas", TapasConfig),
    ]
 )
@@ -185,6 +188,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
        ("prophetnet", "ProphetNet"),
        ("mt5", "mT5"),
        ("mpnet", "MPNet"),
+        ("tapas", "TAPAS"),
    ]
 )

--- a/src/transformers/models/auto/modeling_auto.py
+++ b/src/transformers/models/auto/modeling_auto.py
@@ -165,6 +165,12 @@ from ..squeezebert.modeling_squeezebert import (
    SqueezeBertModel,
 )
 from ..t5.modeling_t5 import T5ForConditionalGeneration, T5Model
+from ..tapas.modeling_tapas import (
+    TapasForMaskedLM,
+    TapasForQuestionAnswering,
+    TapasForSequenceClassification,
+    TapasModel,
+)
 from ..transfo_xl.modeling_transfo_xl import TransfoXLForSequenceClassification, TransfoXLLMHeadModel, TransfoXLModel
 from ..xlm.modeling_xlm import (
    XLMForMultipleChoice,
@@ -230,6 +236,7 @@ from .configuration_auto import (
    RobertaConfig,
    SqueezeBertConfig,
    T5Config,
+    TapasConfig,
    TransfoXLConfig,
    XLMConfig,
    XLMProphetNetConfig,
@@ -277,6 +284,7 @@ MODEL_MAPPING = OrderedDict(
        (XLMProphetNetConfig, XLMProphetNetModel),
        (ProphetNetConfig, ProphetNetModel),
        (MPNetConfig, MPNetModel),
+        (TapasConfig, TapasModel),
    ]
 )
@@ -308,6 +316,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict(
        (LxmertConfig, LxmertForPreTraining),
        (FunnelConfig, FunnelForPreTraining),
        (MPNetConfig, MPNetForMaskedLM),
+        (TapasConfig, TapasForMaskedLM),
    ]
 )
@@ -340,6 +349,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
        (ReformerConfig, ReformerModelWithLMHead),
        (FunnelConfig, FunnelForMaskedLM),
        (MPNetConfig, MPNetForMaskedLM),
+        (TapasConfig, TapasForMaskedLM),
    ]
 )
@@ -386,6 +396,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
        (ReformerConfig, ReformerForMaskedLM),
        (FunnelConfig, FunnelForMaskedLM),
        (MPNetConfig, MPNetForMaskedLM),
+        (TapasConfig, TapasForMaskedLM),
    ]
 )
@@ -431,6 +442,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
        (CTRLConfig, CTRLForSequenceClassification),
        (TransfoXLConfig, TransfoXLForSequenceClassification),
        (MPNetConfig, MPNetForSequenceClassification),
+        (TapasConfig, TapasForSequenceClassification),
    ]
 )
@@ -455,6 +467,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
        (FunnelConfig, FunnelForQuestionAnswering),
        (LxmertConfig, LxmertForQuestionAnswering),
        (MPNetConfig, MPNetForQuestionAnswering),
+        (TapasConfig, TapasForQuestionAnswering),
    ]
 )

--- a/src/transformers/models/auto/tokenization_auto.py
+++ b/src/transformers/models/auto/tokenization_auto.py
@@ -47,6 +47,7 @@ from ..rag.tokenization_rag import RagTokenizer
 from ..retribert.tokenization_retribert import RetriBertTokenizer
 from ..roberta.tokenization_roberta import RobertaTokenizer
 from ..squeezebert.tokenization_squeezebert import SqueezeBertTokenizer
+from ..tapas.tokenization_tapas import TapasTokenizer
 from ..transfo_xl.tokenization_transfo_xl import TransfoXLTokenizer
 from ..xlm.tokenization_xlm import XLMTokenizer
 from .configuration_auto import (
@@ -84,6 +85,7 @@ from .configuration_auto import (
    RobertaConfig,
    SqueezeBertConfig,
    T5Config,
+    TapasConfig,
    TransfoXLConfig,
    XLMConfig,
    XLMProphetNetConfig,
@@ -223,6 +225,7 @@ TOKENIZER_MAPPING = OrderedDict(
        (XLMProphetNetConfig, (XLMProphetNetTokenizer, None)),
        (ProphetNetConfig, (ProphetNetTokenizer, None)),
        (MPNetConfig, (MPNetTokenizer, MPNetTokenizerFast)),
+        (TapasConfig, (TapasTokenizer, None)),
    ]
 )

--- a/src/transformers/models/tapas/__init__.py
+++ b/src/transformers/models/tapas/__init__.py
+# flake8: noqa
+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from ...file_utils import is_torch_available
+from .configuration_tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig
+from .tokenization_tapas import TapasTokenizer
+if is_torch_available():
+    from .modeling_tapas import (
+        TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST,
+        TapasForMaskedLM,
+        TapasForQuestionAnswering,
+        TapasForSequenceClassification,
+        TapasModel,
+    )
--- a/src/transformers/models/tapas/configuration_tapas.py
+++ b/src/transformers/models/tapas/configuration_tapas.py
+# coding=utf-8
+# Copyright 2020 Google Research and The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+TAPAS configuration. Based on the BERT configuration with added parameters.
+Hyperparameters are taken from run_task_main.py and hparam_utils.py of the original implementation. URLS:
+- https://github.com/google-research/tapas/blob/master/tapas/run_task_main.py
+- https://github.com/google-research/tapas/blob/master/tapas/utils/hparam_utils.py
+"""
+from ...configuration_utils import PretrainedConfig
+TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "google/tapas-base-finetuned-sqa": "https://huggingface.co/google/tapas-base-finetuned-sqa/resolve/main/config.json",
+    "google/tapas-base-finetuned-wtq": "https://huggingface.co/google/tapas-base-finetuned-wtq/resolve/main/config.json",
+    "google/tapas-base-finetuned-wikisql-supervised": "https://huggingface.co/google/tapas-base-finetuned-wikisql-supervised/resolve/main/config.json",
+    "google/tapas-base-finetuned-tabfact": "https://huggingface.co/google/tapas-base-finetuned-tabfact/resolve/main/config.json",
+}
+class TapasConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a :class:`~transformers.TapasModel`. It is used to
+    instantiate a TAPAS model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the TAPAS `tapas-base-finetuned-sqa`
+    architecture. Configuration objects inherit from :class:`~transformers.PreTrainedConfig` and can be used to control
+    the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+    Hyperparameters additional to BERT are taken from run_task_main.py and hparam_utils.py of the original
+    implementation. Original implementation available at https://github.com/google-research/tapas/tree/master.
+    Args:
+        vocab_size (:obj:`int`, `optional`, defaults to 30522):
+            Vocabulary size of the TAPAS model. Defines the number of different tokens that can be represented by the
+            :obj:`inputs_ids` passed when calling :class:`~transformers.TapasModel`.
+        hidden_size (:obj:`int`, `optional`, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string,
+            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
+        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 1024):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_sizes (:obj:`List[int]`, `optional`, defaults to :obj:`[3, 256, 256, 2, 256, 256, 10]`):
+            The vocabulary sizes of the :obj:`token_type_ids` passed when calling :class:`~transformers.TapasModel`.
+        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to use gradient checkpointing to save memory at the expense of a slower backward pass.
+        positive_label_weight (:obj:`float`, `optional`, defaults to 10.0):
+            Weight for positive labels.
+        num_aggregation_labels (:obj:`int`, `optional`, defaults to 0):
+            The number of aggregation operators to predict.
+        aggregation_loss_weight (:obj:`float`, `optional`, defaults to 1.0):
+            Importance weight for the aggregation loss.
+        use_answer_as_supervision (:obj:`bool`, `optional`):
+            Whether to use the answer as the only supervision for aggregation examples.
+        answer_loss_importance (:obj:`float`, `optional`, defaults to 1.0):
+            Importance weight for the regression loss.
+        use_normalized_answer_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to normalize the answer loss by the maximum of the predicted and expected value.
+        huber_loss_delta: (:obj:`float`, `optional`):
+            Delta parameter used to calculate the regression loss.
+        temperature: (:obj:`float`, `optional`, defaults to 1.0):
+            Value used to control (OR change) the skewness of cell logits probabilities.
+        aggregation_temperature: (:obj:`float`, `optional`, defaults to 1.0):
+            Scales aggregation logits to control the skewness of probabilities.
+        use_gumbel_for_cells: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to apply Gumbel-Softmax to cell selection.
+        use_gumbel_for_aggregation: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to apply Gumbel-Softmax to aggregation selection.
+        average_approximation_function: (:obj:`string`, `optional`, defaults to :obj:`"ratio"`):
+            Method to calculate the expected average of cells in the weak supervision case. One of :obj:`"ratio"`,
+            :obj:`"first_order"` or :obj:`"second_order"`.
+        cell_selection_preference: (:obj:`float`, `optional`):
+            Preference for cell selection in ambiguous cases. Only applicable in case of weak supervision for
+            aggregation (WTQ, WikiSQL). If the total mass of the aggregation probabilities (excluding the "NONE"
+            operator) is higher than this hyperparameter, then aggregation is predicted for an example.
+        answer_loss_cutoff: (:obj:`float`, `optional`):
+            Ignore examples with answer loss larger than cutoff.
+        max_num_rows: (:obj:`int`, `optional`, defaults to 64):
+            Maximum number of rows.
+        max_num_columns: (:obj:`int`, `optional`, defaults to 32):
+            Maximum number of columns.
+        average_logits_per_cell: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to average logits per cell.
+        select_one_column: (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether to constrain the model to only select cells from a single column.
+        allow_empty_column_selection: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to allow not to select any column.
+        init_cell_selection_weights_to_zero: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to initialize cell selection weights to 0 so that the initial probabilities are 50%.
+        reset_position_index_per_cell: (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether to restart position indexes at every cell (i.e. use relative position embeddings).
+        disable_per_token_loss: (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to disable any (strong or weak) supervision on cells.
+    Example::
+        >>> from transformers import TapasModel, TapasConfig
+        >>> # Initializing a default (SQA) Tapas configuration
+        >>> configuration = TapasConfig()
+        >>> # Initializing a model from the configuration
+        >>> model = TapasModel(configuration)
+        >>> # Accessing the model configuration
+        >>> configuration = model.config
+    """
+    model_type = "tapas"
+    def __init__(
+        self,
+        vocab_size=30522,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=1024,
+        type_vocab_sizes=[3, 256, 256, 2, 256, 256, 10],
+        initializer_range=0.02,
+        layer_norm_eps=1e-12,
+        pad_token_id=0,
+        gradient_checkpointing=False,
+        positive_label_weight=10.0,
+        num_aggregation_labels=0,
+        aggregation_loss_weight=1.0,
+        use_answer_as_supervision=None,
+        answer_loss_importance=1.0,
+        use_normalized_answer_loss=False,
+        huber_loss_delta=None,
+        temperature=1.0,
+        aggregation_temperature=1.0,
+        use_gumbel_for_cells=False,
+        use_gumbel_for_aggregation=False,
+        average_approximation_function="ratio",
+        cell_selection_preference=None,
+        answer_loss_cutoff=None,
+        max_num_rows=64,
+        max_num_columns=32,
+        average_logits_per_cell=False,
+        select_one_column=True,
+        allow_empty_column_selection=False,
+        init_cell_selection_weights_to_zero=False,
+        reset_position_index_per_cell=True,
+        disable_per_token_loss=False,
+        **kwargs
+    ):
+        super().__init__(pad_token_id=pad_token_id, **kwargs)
+        # BERT hyperparameters (with updated max_position_embeddings and type_vocab_sizes)
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_sizes = type_vocab_sizes
+        self.initializer_range = initializer_range
+        self.layer_norm_eps = layer_norm_eps
+        self.gradient_checkpointing = gradient_checkpointing
+        # Fine-tuning task hyperparameters
+        self.positive_label_weight = positive_label_weight
+        self.num_aggregation_labels = num_aggregation_labels
+        self.aggregation_loss_weight = aggregation_loss_weight
+        self.use_answer_as_supervision = use_answer_as_supervision
+        self.answer_loss_importance = answer_loss_importance
+        self.use_normalized_answer_loss = use_normalized_answer_loss
+        self.huber_loss_delta = huber_loss_delta
+        self.temperature = temperature
+        self.aggregation_temperature = aggregation_temperature
+        self.use_gumbel_for_cells = use_gumbel_for_cells
+        self.use_gumbel_for_aggregation = use_gumbel_for_aggregation
+        self.average_approximation_function = average_approximation_function
+        self.cell_selection_preference = cell_selection_preference
+        self.answer_loss_cutoff = answer_loss_cutoff
+        self.max_num_rows = max_num_rows
+        self.max_num_columns = max_num_columns
+        self.average_logits_per_cell = average_logits_per_cell
+        self.select_one_column = select_one_column
+        self.allow_empty_column_selection = allow_empty_column_selection
+        self.init_cell_selection_weights_to_zero = init_cell_selection_weights_to_zero
+        self.reset_position_index_per_cell = reset_position_index_per_cell
+        self.disable_per_token_loss = disable_per_token_loss
--- a/src/transformers/models/tapas/convert_tapas_original_tf_checkpoint_to_pytorch.py
+++ b/src/transformers/models/tapas/convert_tapas_original_tf_checkpoint_to_pytorch.py
+# coding=utf-8
+# Copyright 2020 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Convert TAPAS checkpoint."""
+import argparse
+from transformers.models.tapas.modeling_tapas import (
+    TapasConfig,
+    TapasForMaskedLM,
+    TapasForQuestionAnswering,
+    TapasForSequenceClassification,
+    TapasModel,
+    load_tf_weights_in_tapas,
+)
+from transformers.models.tapas.tokenization_tapas import TapasTokenizer
+from transformers.utils import logging
+logging.set_verbosity_info()
+def convert_tf_checkpoint_to_pytorch(
+    task, reset_position_index_per_cell, tf_checkpoint_path, tapas_config_file, pytorch_dump_path
+):
+    # Initialise PyTorch model.
+    # If you want to convert a checkpoint that uses absolute position embeddings, make sure to set reset_position_index_per_cell of
+    # TapasConfig to False.
+    # initialize configuration from json file
+    config = TapasConfig.from_json_file(tapas_config_file)
+    # set absolute/relative position embeddings parameter
+    config.reset_position_index_per_cell = reset_position_index_per_cell
+    # set remaining parameters of TapasConfig as well as the model based on the task
+    if task == "SQA":
+        model = TapasForQuestionAnswering(config=config)
+    elif task == "WTQ":
+        # run_task_main.py hparams
+        config.num_aggregation_labels = 4
+        config.use_answer_as_supervision = True
+        # hparam_utils.py hparams
+        config.answer_loss_cutoff = 0.664694
+        config.cell_selection_preference = 0.207951
+        config.huber_loss_delta = 0.121194
+        config.init_cell_selection_weights_to_zero = True
+        config.select_one_column = True
+        config.allow_empty_column_selection = False
+        config.temperature = 0.0352513
+        model = TapasForQuestionAnswering(config=config)
+    elif task == "WIKISQL_SUPERVISED":
+        # run_task_main.py hparams
+        config.num_aggregation_labels = 4
+        config.use_answer_as_supervision = False
+        # hparam_utils.py hparams
+        config.answer_loss_cutoff = 36.4519
+        config.cell_selection_preference = 0.903421
+        config.huber_loss_delta = 222.088
+        config.init_cell_selection_weights_to_zero = True
+        config.select_one_column = True
+        config.allow_empty_column_selection = True
+        config.temperature = 0.763141
+        model = TapasForQuestionAnswering(config=config)
+    elif task == "TABFACT":
+        model = TapasForSequenceClassification(config=config)
+    elif task == "MLM":
+        model = TapasForMaskedLM(config=config)
+    elif task == "INTERMEDIATE_PRETRAINING":
+        model = TapasModel(config=config)
+    print("Building PyTorch model from configuration: {}".format(str(config)))
+    # Load weights from tf checkpoint
+    load_tf_weights_in_tapas(model, config, tf_checkpoint_path)
+    # Save pytorch-model (weights and configuration)
+    print("Save PyTorch model to {}".format(pytorch_dump_path))
+    model.save_pretrained(pytorch_dump_path[:-17])
+    # Save tokenizer files
+    dir_name = r"C:\Users\niels.rogge\Documents\Python projecten\tensorflow\Tensorflow models\SQA\Base\tapas_sqa_inter_masklm_base_reset"
+    tokenizer = TapasTokenizer(vocab_file=dir_name + r"\vocab.txt", model_max_length=512)
+    print("Save tokenizer files to {}".format(pytorch_dump_path))
+    tokenizer.save_pretrained(pytorch_dump_path[:-17])
+    print("Used relative position embeddings:", model.config.reset_position_index_per_cell)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    # Required parameters
+    parser.add_argument(
+        "--task", default="SQA", type=str, help="Model task for which to convert a checkpoint. Defaults to SQA."
+    )
+    parser.add_argument(
+        "--reset_position_index_per_cell",
+        default=False,
+        action="store_true",
+        help="Whether to use relative position embeddings or not. Defaults to True.",
+    )
+    parser.add_argument(
+        "--tf_checkpoint_path", default=None, type=str, required=True, help="Path to the TensorFlow checkpoint path."
+    )
+    parser.add_argument(
+        "--tapas_config_file",
+        default=None,
+        type=str,
+        required=True,
+        help="The config json file corresponding to the pre-trained TAPAS model. \n"
+        "This specifies the model architecture.",
+    )
+    parser.add_argument(
+        "--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
+    )
+    args = parser.parse_args()
+    convert_tf_checkpoint_to_pytorch(
+        args.task,
+        args.reset_position_index_per_cell,
+        args.tf_checkpoint_path,
+        args.tapas_config_file,
+        args.pytorch_dump_path,
+    )
--- a/src/transformers/models/tapas/modeling_tapas.py
+++ b/src/transformers/models/tapas/modeling_tapas.py
--- a/src/transformers/models/tapas/tokenization_tapas.py
+++ b/src/transformers/models/tapas/tokenization_tapas.py
--- a/src/transformers/testing_utils.py
+++ b/src/transformers/testing_utils.py
@@ -28,6 +28,8 @@ from .file_utils import (
    _datasets_available,
    _faiss_available,
    _flax_available,
+    _pandas_available,
+    _scatter_available,
    _sentencepiece_available,
    _tf_available,
    _tokenizers_available,
@@ -221,6 +223,27 @@ def require_tokenizers(test_case):
        return test_case
+def require_pandas(test_case):
+    """
+    Decorator marking a test that requires pandas. These tests are skipped when pandas isn't installed.
+    """
+    if not _pandas_available:
+        return unittest.skip("test requires pandas")(test_case)
+    else:
+        return test_case
+def require_scatter(test_case):
+    """
+    Decorator marking a test that requires PyTorch Scatter. These tests are skipped when PyTorch Scatter isn't
+    installed.
+    """
+    if not _scatter_available:
+        return unittest.skip("test requires PyTorch Scatter")(test_case)
+    else:
+        return test_case
 def require_torch_multi_gpu(test_case):
    """
    Decorator marking a test that requires a multi-GPU setup (in PyTorch).

--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
@@ -1867,6 +1867,45 @@ def load_tf_weights_in_t5(*args, **kwargs):
    requires_pytorch(load_tf_weights_in_t5)
+TAPAS_PRETRAINED_MODEL_ARCHIVE_LIST = None
+class TapasForMaskedLM:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class TapasForQuestionAnswering:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class TapasForSequenceClassification:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class TapasModel:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
 TRANSFO_XL_PRETRAINED_MODEL_ARCHIVE_LIST = None

--- a/tests/test_modeling_tapas.py
+++ b/tests/test_modeling_tapas.py