Add splinter (#12955)

* splinter template * initialize splinter classes * Splinter Tokenizer * splinter.rst * tokenization fixes * Documentation & some minor variable name changes * bug fix (added back question_token_id to config) + variable names * Minor bug fixes + variable name changes * Fix Splinter references after merge with new transformers * changes after running make style & quality * Fix documentation unindent * Fix doc indentation in tokenization_splinter * Fix also SplinterTokenizerFast * Add Splinter to index.rst and README * Fixdouble whitespace from index.rst * Fixed index.rst with 'make fix-copies' * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/splinter/__init__.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Added "copied from BERT" comments * Removing unnexessary code from modeling_splinter * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Remove references to TF modeling from splinter * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add differences between Splinter and Bert tokenizers * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Doc formatting * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * bug fix: remove load_tf_weights attribute * Some minor quality changes * Update docs/source/model_doc/splinter.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change FullyConnectedLayer to SplinterFullyConnectedLayer * Variable naming * Reove gather_positions function * Remove ClassificationHead as it's outdated * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Remove hardcoded 102 token id * Minor style change * Added "tau" organization to all model identifiers & URLS * Added tau to the tests as well * Copy-from comments * Removed all unnecessary classes (e.g. SplinterForMaskedLM) * Running make fix-copies * Bug fix: Further removed unnecessary classes * Add Splinter to AutoTokenization * Add an integration test for Splinter * Removed initialize_new_qass from config - It will be done through different checkpoints * Removed `initialize_new_qass` from documentation as well * Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code * Minor change to test * SplinterTokenizer now doesn't abstract from BertTokenizer * SplinterTokenizerFast also dosn't abstract from Bert * style and quality * bug fix: import ing torch in tests only if it's available * Auto mappings * Changed copyrights in Splinter's files * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Add splinter (#12955)
* splinter template * initialize splinter classes * Splinter Tokenizer * splinter.rst * tokenization fixes * Documentation & some minor variable name changes * bug fix (added back question_token_id to config) + variable names * Minor bug fixes + variable name changes * Fix Splinter references after merge with new transformers * changes after running make style & quality * Fix documentation unindent * Fix doc indentation in tokenization_splinter * Fix also SplinterTokenizerFast * Add Splinter to index.rst and README * Fixdouble whitespace from index.rst * Fixed index.rst with 'make fix-copies' * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update docs/source/model_doc/splinter.rst Co-authored-by: Suraj Patil <surajp815@gmail.com> * Update src/transformers/models/splinter/__init__.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Added "copied from BERT" comments * Removing unnexessary code from modeling_splinter * Update README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Suraj Patil <surajp815@gmail.com> * Remove references to TF modeling from splinter * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Add differences between Splinter and Bert tokenizers * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter_fast.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Remove unnecessary check * Doc formatting * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/splinter/tokenization_splinter.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * bug fix: remove load_tf_weights attribute * Some minor quality changes * Update docs/source/model_doc/splinter.rst Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Change FullyConnectedLayer to SplinterFullyConnectedLayer * Variable naming * Reove gather_positions function * Remove ClassificationHead as it's outdated * Update src/transformers/models/splinter/modeling_splinter.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Remove hardcoded 102 token id * Minor style change * Added "tau" organization to all model identifiers & URLS * Added tau to the tests as well * Copy-from comments * Removed all unnecessary classes (e.g. SplinterForMaskedLM) * Running make fix-copies * Bug fix: Further removed unnecessary classes * Add Splinter to AutoTokenization * Add an integration test for Splinter * Removed initialize_new_qass from config - It will be done through different checkpoints * Removed `initialize_new_qass` from documentation as well * Added new checkpoint names (`tau/splinter-base-qass` and same for large) in the code * Minor change to test * SplinterTokenizer now doesn't abstract from BertTokenizer * SplinterTokenizerFast also dosn't abstract from Bert * style and quality * bug fix: import ing torch in tests only if it's available * Auto mappings * Changed copyrights in Splinter's files * Update src/transformers/models/splinter/configuration_splinter.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: yuvalkirstain <kirstain.yuval@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
439a43b6 · Ori Ram · GitHub · 6626d8a6 · 439a43b6 · 439a43b6
Unverified Commit 439a43b6 authored Aug 17, 2021 by Ori Ram Committed by GitHub Aug 17, 2021
17 changed files
--- a/README.md
+++ b/README.md
@@ -263,6 +263,7 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
 1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
 1. **[RoFormer](https://huggingface.co/transformers/model_doc/roformer.html)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
 1. **[SpeechToTextTransformer](https://huggingface.co/transformers/model_doc/speech_to_text.html)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
+1. **[Splinter](https://huggingface.co/transformers/model_doc/master/splinter.html)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
 1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
 1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
 1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -257,41 +257,44 @@ Supported models
 53. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
    `fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
    Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
-54. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
+54. `Splinter <https://huggingface.co/transformers/model_doc/master/splinter.html>`__ (from Tel Aviv University),
+    released together with the paper `Few-Shot Question Answering by Pretraining Span Selection
+    <https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
+55. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
    about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
    Krishna, and Kurt W. Keutzer.
-55. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
+56. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
    Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
    Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
-56. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
+57. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
    Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
    Francesco Piccinno and Julian Martin Eisenschlos.
-57. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
+58. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
    Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
    Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
-58. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
+59. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
    Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
    Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
    Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
-59. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
+60. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
    Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
    Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
-60. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
+61. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
    Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
    Zhou, Abdelrahman Mohamed, Michael Auli.
-61. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
+62. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
    Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
-62. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
+63. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
    Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
    Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
-63. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
+64. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
    Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
    Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
    Zettlemoyer and Veselin Stoyanov.
-64. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
+65. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
    Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
    Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
-65. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
+66. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
    Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
    Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
@@ -413,6 +416,8 @@ Flax), PyTorch, and/or TensorFlow.
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |         Speech2Text         |       ✅       |       ❌       |       ✅        |         ❌         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
+|          Splinter           |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |         SqueezeBERT         |       ✅       |       ✅       |       ✅        |         ❌         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |             T5              |       ✅       |       ✅       |       ✅        |         ✅         |      ✅      |
@@ -571,6 +576,7 @@ Flax), PyTorch, and/or TensorFlow.
    model_doc/roberta
    model_doc/roformer
    model_doc/speech_to_text
+    model_doc/splinter
    model_doc/squeezebert
    model_doc/t5
    model_doc/tapas

--- a/docs/source/model_doc/splinter.rst
+++ b/docs/source/model_doc/splinter.rst
+.. 
+    Copyright 2021 The HuggingFace Team. All rights reserved.
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+    the License. You may obtain a copy of the License at
+        http://www.apache.org/licenses/LICENSE-2.0
+    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+    specific language governing permissions and limitations under the License.
+Splinter
+-----------------------------------------------------------------------------------------------------------------------
+Overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The Splinter model was proposed in `Few-Shot Question Answering by Pretraining Span Selection
+<https://arxiv.org/abs/2101.00438>`__ by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. Splinter
+is an encoder-only transformer (similar to BERT) pretrained using the recurring span selection task on a large corpus
+comprising Wikipedia and the Toronto Book Corpus.
+The abstract from the paper is the following:
+In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order
+of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred
+training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between
+current pretraining objectives and question answering. We propose a new pretraining scheme tailored for question
+answering: recurring span selection. Given a passage with multiple sets of recurring spans, we mask in each set all
+recurring spans but one, and ask the model to select the correct span in the passage for each masked span. Masked spans
+are replaced with a special token, viewed as a question representation, that is later used during fine-tuning to select
+the answer span. The resulting model obtains surprisingly good results on multiple benchmarks (e.g., 72.7 F1 on SQuAD
+with only 128 training examples), while maintaining competitive performance in the high-resource setting.
+Tips:
+- Splinter was trained to predict answers spans conditioned on a special [QUESTION] token. These tokens contextualize
+  to question representations which are used to predict the answers. This layer is called QASS, and is the default
+  behaviour in the :class:`~transformers.SplinterForQuestionAnswering` class. Therefore:
+- Use :class:`~transformers.SplinterTokenizer` (rather than :class:`~transformers.BertTokenizer`), as it already
+  contains this special token. Also, its default behavior is to use this token when two sequences are given (for
+  example, in the `run_qa.py` script).
+- If you plan on using Splinter outside `run_qa.py`, please keep in mind the question token - it might be important for
+  the success of your model, especially in a few-shot setting.
+- Please note there are two different checkpoints for each size of Splinter. Both are basically the same, except that
+  one also has the pretrained wights of the QASS layer (`tau/splinter-base-qass` and `tau/splinter-large-qass`) and one
+  doesn't (`tau/splinter-base` and `tau/splinter-large`). This is done to support randomly initializing this layer at
+  fine-tuning, as it is shown to yield better results for some cases in the paper.
+This model was contributed by `yuvalkirstain <https://huggingface.co/yuvalkirstain>`__ and `oriram
+<https://huggingface.co/oriram>`__. The original code can be found `here <https://github.com/oriram/splinter>`__.
+SplinterConfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.SplinterConfig
+    :members:
+SplinterTokenizer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.SplinterTokenizer
+    :members: build_inputs_with_special_tokens, get_special_tokens_mask,
+        create_token_type_ids_from_sequences, save_vocabulary
+SplinterTokenizerFast
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.SplinterTokenizerFast
+    :members:
+SplinterModel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.SplinterModel
+    :members: forward
+SplinterForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.SplinterForQuestionAnswering
+    :members: forward
--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -233,6 +233,7 @@ _import_structure = {
        "SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP",
        "Speech2TextConfig",
    ],
+    "models.splinter": ["SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP", "SplinterConfig", "SplinterTokenizer"],
    "models.squeezebert": ["SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "SqueezeBertConfig", "SqueezeBertTokenizer"],
    "models.t5": ["T5_PRETRAINED_CONFIG_ARCHIVE_MAP", "T5Config"],
    "models.tapas": ["TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP", "TapasConfig", "TapasTokenizer"],
@@ -369,6 +370,7 @@ if is_tokenizers_available():
    _import_structure["models.rembert"].append("RemBertTokenizerFast")
    _import_structure["models.retribert"].append("RetriBertTokenizerFast")
    _import_structure["models.roberta"].append("RobertaTokenizerFast")
+    _import_structure["models.splinter"].append("SplinterTokenizerFast")
    _import_structure["models.squeezebert"].append("SqueezeBertTokenizerFast")
    _import_structure["models.t5"].append("T5TokenizerFast")
    _import_structure["models.xlm_roberta"].append("XLMRobertaTokenizerFast")
@@ -1046,6 +1048,15 @@ if is_torch_available():
            "Speech2TextPreTrainedModel",
        ]
    )
+    _import_structure["models.splinter"].extend(
+        [
+            "SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST",
+            "SplinterForQuestionAnswering",
+            "SplinterLayer",
+            "SplinterModel",
+            "SplinterPreTrainedModel",
+        ]
+    )
    _import_structure["models.squeezebert"].extend(
        [
            "SQUEEZEBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
@@ -1914,6 +1925,7 @@ if TYPE_CHECKING:
    from .models.roberta import ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP, RobertaConfig, RobertaTokenizer
    from .models.roformer import ROFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, RoFormerConfig, RoFormerTokenizer
    from .models.speech_to_text import SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP, Speech2TextConfig
+    from .models.splinter import SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP, SplinterConfig, SplinterTokenizer
    from .models.squeezebert import SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, SqueezeBertConfig, SqueezeBertTokenizer
    from .models.t5 import T5_PRETRAINED_CONFIG_ARCHIVE_MAP, T5Config
    from .models.tapas import TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TapasConfig, TapasTokenizer
@@ -2045,6 +2057,7 @@ if TYPE_CHECKING:
        from .models.retribert import RetriBertTokenizerFast
        from .models.roberta import RobertaTokenizerFast
        from .models.roformer import RoFormerTokenizerFast
+        from .models.splinter import SplinterTokenizerFast
        from .models.squeezebert import SqueezeBertTokenizerFast
        from .models.t5 import T5TokenizerFast
        from .models.xlm_roberta import XLMRobertaTokenizerFast
@@ -2602,6 +2615,13 @@ if TYPE_CHECKING:
            Speech2TextModel,
            Speech2TextPreTrainedModel,
        )
+        from .models.splinter import (
+            SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST,
+            SplinterForQuestionAnswering,
+            SplinterLayer,
+            SplinterModel,
+            SplinterPreTrainedModel,
+        )
        from .models.squeezebert import (
            SQUEEZEBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
            SqueezeBertForMaskedLM,

--- a/src/transformers/convert_slow_tokenizer.py
+++ b/src/transformers/convert_slow_tokenizer.py
@@ -108,6 +108,56 @@ class BertConverter(Converter):
        return tokenizer
+class SplinterConverter(Converter):
+    def converted(self) -> Tokenizer:
+        vocab = self.original_tokenizer.vocab
+        tokenizer = Tokenizer(WordPiece(vocab, unk_token=str(self.original_tokenizer.unk_token)))
+        tokenize_chinese_chars = False
+        strip_accents = False
+        do_lower_case = False
+        if hasattr(self.original_tokenizer, "basic_tokenizer"):
+            tokenize_chinese_chars = self.original_tokenizer.basic_tokenizer.tokenize_chinese_chars
+            strip_accents = self.original_tokenizer.basic_tokenizer.strip_accents
+            do_lower_case = self.original_tokenizer.basic_tokenizer.do_lower_case
+        tokenizer.normalizer = normalizers.BertNormalizer(
+            clean_text=True,
+            handle_chinese_chars=tokenize_chinese_chars,
+            strip_accents=strip_accents,
+            lowercase=do_lower_case,
+        )
+        tokenizer.pre_tokenizer = pre_tokenizers.BertPreTokenizer()
+        cls = str(self.original_tokenizer.cls_token)
+        sep = str(self.original_tokenizer.sep_token)
+        question = str(self.original_tokenizer.question_token)
+        dot = "."
+        cls_token_id = self.original_tokenizer.cls_token_id
+        sep_token_id = self.original_tokenizer.sep_token_id
+        question_token_id = self.original_tokenizer.question_token_id
+        dot_token_id = self.original_tokenizer.convert_tokens_to_ids(".")
+        if self.original_tokenizer.padding_side == "right":
+            pair = f"{cls}:0 $A:0 {question} {dot} {sep}:0 $B:1 {sep}:1"
+        else:
+            pair = f"{cls}:0 $A:0 {sep}:0 $B:1 {question} {dot} {sep}:1"
+        tokenizer.post_processor = processors.TemplateProcessing(
+            single=f"{cls}:0 $A:0 {sep}:0",
+            pair=pair,
+            special_tokens=[
+                (cls, cls_token_id),
+                (sep, sep_token_id),
+                (question, question_token_id),
+                (dot, dot_token_id),
+            ],
+        )
+        tokenizer.decoder = decoders.WordPiece(prefix="##")
+        return tokenizer
 class FunnelConverter(Converter):
    def converted(self) -> Tokenizer:
        vocab = self.original_tokenizer.vocab
@@ -829,6 +879,7 @@ SLOW_TO_FAST_CONVERTERS = {
    "T5Tokenizer": T5Converter,
    "XLMRobertaTokenizer": XLMRobertaConverter,
    "XLNetTokenizer": XLNetConverter,
+    "SplinterTokenizer": SplinterConverter,
 }

--- a/src/transformers/models/__init__.py
+++ b/src/transformers/models/__init__.py
@@ -78,6 +78,7 @@ from . import (
    roberta,
    roformer,
    speech_to_text,
+    splinter,
    squeezebert,
    t5,
    tapas,

--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -87,6 +87,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
        ("layoutlm", "LayoutLMConfig"),
        ("rag", "RagConfig"),
        ("tapas", "TapasConfig"),
+        ("splinter", "SplinterConfig"),
    ]
 )
@@ -147,6 +148,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
        ("tapas", "TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP"),
        ("ibert", "IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
        ("hubert", "HUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
+        ("splinter", "SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
    ]
 )
@@ -222,6 +224,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
        ("bert-japanese", "BertJapanese"),
        ("byt5", "ByT5"),
        ("mbart50", "mBART-50"),
+        ("splinter", "Splinter"),
    ]
 )

--- a/src/transformers/models/auto/modeling_auto.py
+++ b/src/transformers/models/auto/modeling_auto.py
@@ -88,6 +88,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
        ("mpnet", "MPNetModel"),
        ("tapas", "TapasModel"),
        ("ibert", "IBertModel"),
+        ("splinter", "SplinterModel"),
    ]
 )
@@ -356,6 +357,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
        ("deberta", "DebertaForQuestionAnswering"),
        ("deberta-v2", "DebertaV2ForQuestionAnswering"),
        ("ibert", "IBertForQuestionAnswering"),
+        ("splinter", "SplinterForQuestionAnswering"),
    ]
 )

--- a/src/transformers/models/auto/tokenization_auto.py
+++ b/src/transformers/models/auto/tokenization_auto.py
@@ -174,6 +174,7 @@ else:
            ("canine", ("CanineTokenizer", None)),
            ("bertweet", ("BertweetTokenizer", None)),
            ("bert-japanese", ("BertJapaneseTokenizer", None)),
+            ("splinter", ("SplinterTokenizer", "SplinterTokenizerFast")),
            ("byt5", ("ByT5Tokenizer", None)),
            (
                "cpm",

--- a/src/transformers/models/splinter/__init__.py
+++ b/src/transformers/models/splinter/__init__.py
+# flake8: noqa
+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+# Copyright 2021 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+from ...file_utils import _LazyModule, is_tokenizers_available, is_torch_available
+_import_structure = {
+    "configuration_splinter": ["SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP", "SplinterConfig"],
+    "tokenization_splinter": ["SplinterTokenizer"],
+}
+if is_tokenizers_available():
+    _import_structure["tokenization_splinter_fast"] = ["SplinterTokenizerFast"]
+if is_torch_available():
+    _import_structure["modeling_splinter"] = [
+        "SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST",
+        "SplinterForQuestionAnswering",
+        "SplinterLayer",
+        "SplinterModel",
+        "SplinterPreTrainedModel",
+    ]
+if TYPE_CHECKING:
+    from .configuration_splinter import SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP, SplinterConfig
+    from .tokenization_splinter import SplinterTokenizer
+    if is_tokenizers_available():
+        from .tokenization_splinter_fast import SplinterTokenizerFast
+    if is_torch_available():
+        from .modeling_splinter import (
+            SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST,
+            SplinterForQuestionAnswering,
+            SplinterLayer,
+            SplinterModel,
+            SplinterPreTrainedModel,
+        )
+else:
+    import sys
+    sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure)
--- a/src/transformers/models/splinter/configuration_splinter.py
+++ b/src/transformers/models/splinter/configuration_splinter.py
+# coding=utf-8
+# Copyright 2021 Tel AViv University, AllenAI and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" Splinter model configuration """
+from ...configuration_utils import PretrainedConfig
+from ...utils import logging
+logger = logging.get_logger(__name__)
+SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "tau/splinter-base": "https://huggingface.co/tau/splinter-base/resolve/main/config.json",
+    "tau/splinter-base-qass": "https://huggingface.co/tau/splinter-base-qass/resolve/main/config.json",
+    "tau/splinter-large": "https://huggingface.co/tau/splinter-large/resolve/main/config.json",
+    "tau/splinter-large-qass": "https://huggingface.co/tau/splinter-large-qass/resolve/main/config.json",
+    # See all Splinter models at https://huggingface.co/models?filter=splinter
+}
+class SplinterConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a :class:`~transformers.SplinterModel`. It is used to
+    instantiate an Splinter model according to the specified arguments, defining the model architecture. Instantiating
+    a configuration with the defaults will yield a similar configuration to that of the Splinter `tau/splinter-base
+    <https://huggingface.co/tau/splinter-base>`__ architecture.
+    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
+    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+    Args:
+        vocab_size (:obj:`int`, `optional`, defaults to 30522):
+            Vocabulary size of the Splinter model. Defines the number of different tokens that can be represented by
+            the :obj:`inputs_ids` passed when calling :class:`~transformers.SplinterModel`.
+        hidden_size (:obj:`int`, `optional`, defaults to 768):
+            Dimension of the encoder layers and the pooler layer.
+        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
+            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
+        hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string,
+            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
+        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (:obj:`int`, `optional`, defaults to 2):
+            The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.SplinterModel`.
+        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if ``config.is_decoder=True``.
+        gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
+        question_token_id (:obj:`int`, `optional`, defaults to 104):
+            The id of the ``[QUESTION]`` token.
+        Example::
+            >>> from transformers import SplinterModel, SplinterConfig
+            >>> # Initializing a Splinter tau/splinter-base style configuration
+            >>> configuration = SplinterConfig()
+            >>> # Initializing a model from the tau/splinter-base style configuration
+            >>> model = SplinterModel(configuration)
+            >>> # Accessing the model configuration
+            >>> configuration = model.config
+    """
+    model_type = "splinter"
+    def __init__(
+        self,
+        vocab_size=30522,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=512,
+        type_vocab_size=2,
+        initializer_range=0.02,
+        layer_norm_eps=1e-12,
+        use_cache=True,
+        pad_token_id=0,
+        question_token_id=104,
+        **kwargs
+    ):
+        super().__init__(pad_token_id=pad_token_id, **kwargs)
+        self.vocab_size = vocab_size
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.intermediate_size = intermediate_size
+        self.hidden_act = hidden_act
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.initializer_range = initializer_range
+        self.type_vocab_size = type_vocab_size
+        self.layer_norm_eps = layer_norm_eps
+        self.use_cache = use_cache
+        self.question_token_id = question_token_id
--- a/src/transformers/models/splinter/modeling_splinter.py
+++ b/src/transformers/models/splinter/modeling_splinter.py
--- a/src/transformers/models/splinter/tokenization_splinter.py
+++ b/src/transformers/models/splinter/tokenization_splinter.py
--- a/src/transformers/models/splinter/tokenization_splinter_fast.py
+++ b/src/transformers/models/splinter/tokenization_splinter_fast.py
+# coding=utf-8
+# Copyright 2021 Tel AViv University, AllenAI and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Fast Tokenization classes for Splinter."""
+import json
+from typing import List, Optional, Tuple
+from tokenizers import normalizers
+from ...tokenization_utils_fast import PreTrainedTokenizerFast
+from ...utils import logging
+from .tokenization_splinter import SplinterTokenizer
+logger = logging.get_logger(__name__)
+VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
+PRETRAINED_VOCAB_FILES_MAP = {
+    "vocab_file": {
+        "tau/splinter-base": "https://huggingface.co/tau/splinter-base/resolve/main/vocab.txt",
+        "tau/splinter-base-qass": "https://huggingface.co/tau/splinter-base-qass/resolve/main/vocab.txt",
+        "tau/splinter-large": "https://huggingface.co/tau/splinter-large/resolve/main/vocab.txt",
+        "tau/splinter-large-qass": "https://huggingface.co/tau/splinter-large-qass/resolve/main/vocab.txt",
+    }
+}
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
+    "tau/splinter-base": 512,
+    "tau/splinter-base-qass": 512,
+    "tau/splinter-large": 512,
+    "tau/splinter-large-qass": 512,
+}
+PRETRAINED_INIT_CONFIGURATION = {
+    "tau/splinter-base": {"do_lower_case": False},
+    "tau/splinter-base-qass": {"do_lower_case": False},
+    "tau/splinter-large": {"do_lower_case": False},
+    "tau/splinter-large-qass": {"do_lower_case": False},
+}
+class SplinterTokenizerFast(PreTrainedTokenizerFast):
+    r"""
+    Construct a "fast" Splinter tokenizer (backed by HuggingFace's `tokenizers` library). Based on WordPiece.
+    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
+    methods. Users should refer to this superclass for more information regarding those methods.
+    Args:
+        vocab_file (:obj:`str`):
+            File containing the vocabulary.
+        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to lowercase the input when tokenizing.
+        unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
+            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
+            token instead.
+        sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
+            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
+            sequence classification or for a text and a question for question answering. It is also used as the last
+            token of a sequence built with special tokens.
+        pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
+            The token used for padding, for example when batching sequences of different lengths.
+        cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
+            The classifier token which is used when doing sequence classification (classification of the whole sequence
+            instead of per-token classification). It is the first token of the sequence when built with special tokens.
+        mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
+            The token used for masking values. This is the token used when training this model with masked language
+            modeling. This is the token which the model will try to predict.
+        question_token (:obj:`str`, `optional`, defaults to :obj:`"[QUESTION]"`):
+            The token used for constructing question representations.
+        clean_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to clean the text before tokenization by removing any control characters and replacing all
+            whitespaces by the classic one.
+        tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
+            Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see `this
+            issue <https://github.com/huggingface/transformers/issues/328>`__).
+        strip_accents: (:obj:`bool`, `optional`):
+            Whether or not to strip all accents. If this option is not specified, then it will be determined by the
+            value for :obj:`lowercase` (as in the original BERT).
+        wordpieces_prefix: (:obj:`str`, `optional`, defaults to :obj:`"##"`):
+            The prefix for subwords.
+    """
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+    slow_tokenizer_class = SplinterTokenizer
+    def __init__(
+        self,
+        vocab_file=None,
+        tokenizer_file=None,
+        do_lower_case=True,
+        unk_token="[UNK]",
+        sep_token="[SEP]",
+        pad_token="[PAD]",
+        cls_token="[CLS]",
+        mask_token="[MASK]",
+        question_token="[QUESTION]",
+        tokenize_chinese_chars=True,
+        strip_accents=None,
+        **kwargs
+    ):
+        super().__init__(
+            vocab_file,
+            tokenizer_file=tokenizer_file,
+            do_lower_case=do_lower_case,
+            unk_token=unk_token,
+            sep_token=sep_token,
+            pad_token=pad_token,
+            cls_token=cls_token,
+            mask_token=mask_token,
+            tokenize_chinese_chars=tokenize_chinese_chars,
+            strip_accents=strip_accents,
+            additional_special_tokens=(question_token,),
+            **kwargs,
+        )
+        pre_tok_state = json.loads(self.backend_tokenizer.normalizer.__getstate__())
+        if (
+            pre_tok_state.get("lowercase", do_lower_case) != do_lower_case
+            or pre_tok_state.get("strip_accents", strip_accents) != strip_accents
+        ):
+            pre_tok_class = getattr(normalizers, pre_tok_state.pop("type"))
+            pre_tok_state["lowercase"] = do_lower_case
+            pre_tok_state["strip_accents"] = strip_accents
+            self.backend_tokenizer.normalizer = pre_tok_class(**pre_tok_state)
+        self.do_lower_case = do_lower_case
+    @property
+    def question_token_id(self):
+        """
+        :obj:`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
+        representation.
+        """
+        return self.convert_tokens_to_ids(self.question_token)
+    def build_inputs_with_special_tokens(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special
+        tokens. A Splinter sequence has the following format:
+        - single sequence: ``[CLS] X [SEP]``
+        - pair of sequences for question answering: ``[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]``
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                The question token IDs if pad_on_right, else context tokens IDs
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                The context token IDs if pad_on_right, else question token IDs
+        Returns:
+            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+        """
+        if token_ids_1 is None:
+            return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
+        cls = [self.cls_token_id]
+        sep = [self.sep_token_id]
+        question_suffix = [self.question_token_id] + [self.convert_tokens_to_ids(".")]
+        if self.padding_side == "right":
+            # Input is question-then-context
+            return cls + token_ids_0 + question_suffix + sep + token_ids_1 + sep
+        else:
+            # Input is context-then-question
+            return cls + token_ids_0 + sep + token_ids_1 + question_suffix + sep
+    def create_token_type_ids_from_sequences(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Create the token type IDs corresponding to the sequences passed. `What are token type IDs?
+        <../glossary.html#token-type-ids>`__
+        Should be overridden in a subclass if the model has a special way of building those.
+        Args:
+            token_ids_0 (:obj:`List[int]`): The first tokenized sequence.
+            token_ids_1 (:obj:`List[int]`, `optional`): The second tokenized sequence.
+        Returns:
+            :obj:`List[int]`: The token type ids.
+        """
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+        question_suffix = [self.question_token_id] + [self.convert_tokens_to_ids(".")]
+        if token_ids_1 is None:
+            return len(cls + token_ids_0 + sep) * [0]
+        if self.padding_side == "right":
+            # Input is question-then-context
+            return len(cls + token_ids_0 + question_suffix + sep) * [0] + len(token_ids_1 + sep) * [1]
+        else:
+            # Input is context-then-question
+            return len(cls + token_ids_0 + sep) * [0] + len(token_ids_1 + question_suffix + sep) * [1]
+    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
+        files = self._tokenizer.model.save(save_directory, name=filename_prefix)
+        return tuple(files)
--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
@@ -3121,6 +3121,41 @@ class Speech2TextPreTrainedModel:
        requires_backends(cls, ["torch"])
+SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST = None
+class SplinterForQuestionAnswering:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+class SplinterLayer:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+class SplinterModel:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+class SplinterPreTrainedModel:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
 SQUEEZEBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None

--- a/src/transformers/utils/dummy_tokenizers_objects.py
+++ b/src/transformers/utils/dummy_tokenizers_objects.py
@@ -299,6 +299,15 @@ class RoFormerTokenizerFast:
        requires_backends(cls, ["tokenizers"])
+class SplinterTokenizerFast:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["tokenizers"])
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["tokenizers"])
 class SqueezeBertTokenizerFast:
    def __init__(self, *args, **kwargs):
        requires_backends(self, ["tokenizers"])

--- a/tests/test_modeling_splinter.py
+++ b/tests/test_modeling_splinter.py
+# coding=utf-8
+# Copyright 2021 The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" Testing suite for the PyTorch Splinter model. """
+import unittest
+from transformers import is_torch_available
+from transformers.testing_utils import require_torch, slow, torch_device
+from .test_configuration_common import ConfigTester
+from .test_modeling_common import ModelTesterMixin, ids_tensor, random_attention_mask
+if is_torch_available():
+    import torch
+    from transformers import SplinterConfig, SplinterForQuestionAnswering, SplinterModel
+    from transformers.models.splinter.modeling_splinter import SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST
+class SplinterModelTester:
+    def __init__(
+        self,
+        parent,
+        batch_size=13,
+        seq_length=7,
+        is_training=True,
+        use_input_mask=True,
+        use_token_type_ids=True,
+        use_labels=True,
+        vocab_size=99,
+        hidden_size=32,
+        num_hidden_layers=5,
+        num_attention_heads=4,
+        intermediate_size=37,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=512,
+        type_vocab_size=16,
+        type_sequence_label_size=2,
+        initializer_range=0.02,
+        num_labels=3,
+        num_choices=4,
+        scope=None,
+    ):
+        self.parent = parent
+        self.batch_size = batch_size
+        self.seq_length = seq_length
+        self.is_training = is_training
+        self.use_input_mask = use_input_mask
+        self.use_token_type_ids = use_token_type_ids
+        self.use_labels = use_labels
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.intermediate_size = intermediate_size
+        self.hidden_act = hidden_act
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.type_sequence_label_size = type_sequence_label_size
+        self.initializer_range = initializer_range
+        self.num_labels = num_labels
+        self.num_choices = num_choices
+        self.scope = scope
+    def prepare_config_and_inputs(self):
+        input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
+        input_mask = None
+        if self.use_input_mask:
+            input_mask = random_attention_mask([self.batch_size, self.seq_length])
+        token_type_ids = None
+        if self.use_token_type_ids:
+            token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size)
+        sequence_labels = None
+        token_labels = None
+        choice_labels = None
+        if self.use_labels:
+            sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
+            token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
+            choice_labels = ids_tensor([self.batch_size], self.num_choices)
+        config = SplinterConfig(
+            vocab_size=self.vocab_size,
+            hidden_size=self.hidden_size,
+            num_hidden_layers=self.num_hidden_layers,
+            num_attention_heads=self.num_attention_heads,
+            intermediate_size=self.intermediate_size,
+            hidden_act=self.hidden_act,
+            hidden_dropout_prob=self.hidden_dropout_prob,
+            attention_probs_dropout_prob=self.attention_probs_dropout_prob,
+            max_position_embeddings=self.max_position_embeddings,
+            type_vocab_size=self.type_vocab_size,
+            is_decoder=False,
+            initializer_range=self.initializer_range,
+        )
+        return config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
+    def create_and_check_model(
+        self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
+    ):
+        model = SplinterModel(config=config)
+        model.to(torch_device)
+        model.eval()
+        result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
+        result = model(input_ids, token_type_ids=token_type_ids)
+        result = model(input_ids)
+        self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.hidden_size))
+    def create_and_check_for_question_answering(
+        self, config, input_ids, token_type_ids, input_mask, sequence_labels, token_labels, choice_labels
+    ):
+        model = SplinterForQuestionAnswering(config=config)
+        model.to(torch_device)
+        model.eval()
+        result = model(
+            input_ids,
+            attention_mask=input_mask,
+            token_type_ids=token_type_ids,
+            start_positions=sequence_labels,
+            end_positions=sequence_labels,
+        )
+        self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length))
+        self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length))
+    def prepare_config_and_inputs_for_common(self):
+        config_and_inputs = self.prepare_config_and_inputs()
+        (
+            config,
+            input_ids,
+            token_type_ids,
+            input_mask,
+            sequence_labels,
+            token_labels,
+            choice_labels,
+        ) = config_and_inputs
+        inputs_dict = {"input_ids": input_ids, "token_type_ids": token_type_ids, "attention_mask": input_mask}
+        return config, inputs_dict
+@require_torch
+class SplinterModelTest(ModelTesterMixin, unittest.TestCase):
+    all_model_classes = (
+        (
+            SplinterModel,
+            SplinterForQuestionAnswering,
+        )
+        if is_torch_available()
+        else ()
+    )
+    def setUp(self):
+        self.model_tester = SplinterModelTester(self)
+        self.config_tester = ConfigTester(self, config_class=SplinterConfig, hidden_size=37)
+    def test_config(self):
+        self.config_tester.run_common_tests()
+    def test_model(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs()
+        self.model_tester.create_and_check_model(*config_and_inputs)
+    def test_model_various_embeddings(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs()
+        for type in ["absolute", "relative_key", "relative_key_query"]:
+            config_and_inputs[0].position_embedding_type = type
+            self.model_tester.create_and_check_model(*config_and_inputs)
+    def test_for_question_answering(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs()
+        self.model_tester.create_and_check_for_question_answering(*config_and_inputs)
+    @slow
+    def test_model_from_pretrained(self):
+        for model_name in SPLINTER_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
+            model = SplinterModel.from_pretrained(model_name)
+            self.assertIsNotNone(model)
+@require_torch
+class SplinterModelIntegrationTest(unittest.TestCase):
+    @slow
+    def test_splinter_question_answering(self):
+        model = SplinterForQuestionAnswering.from_pretrained("tau/splinter-base-qass")
+        # Input: "[CLS] Brad was born in [QUESTION] . He returned to the United Kingdom later . [SEP]"
+        # Output should be the span "the United Kingdom"
+        input_ids = torch.tensor(
+            [[101, 7796, 1108, 1255, 1107, 104, 119, 1124, 1608, 1106, 1103, 1244, 2325, 1224, 119, 102]]
+        )
+        output = model(input_ids)
+        expected_shape = torch.Size((1, 16))
+        self.assertEqual(output.start_logits.shape, expected_shape)
+        self.assertEqual(output.end_logits.shape, expected_shape)
+        self.assertEqual(torch.argmax(output.start_logits), 10)
+        self.assertEqual(torch.argmax(output.end_logits), 12)