Add CANINE (#12024)

* First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Add support for hidden_states and attentions of shallow encoders * Define custom CanineModelOutputWithPooling, tests pass * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Make conversion script work for Canine-c too * Fix tokenizer tests * Remove file Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Add CANINE (#12024)
* First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Add support for hidden_states and attentions of shallow encoders * Define custom CanineModelOutputWithPooling, tests pass * First pass * More progress * Add support for local attention * More improvements * More improvements * Conversion script working * Add CanineTokenizer * Make style & quality * First draft of integration test * Remove decoder test * Improve tests * Add documentation * Mostly docs improvements * Add CanineTokenizer tests * Fix most tests on GPU, improve upsampling projection * Address most comments by @dhgarrette * Remove decoder logic * Improve Canine tests, improve docs of CanineConfig * All tokenizer tests passing * Make fix-copies and fix tokenizer tests * Fix test_model_outputs_equivalence test * Apply suggestions from @sgugger's review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Address some more comments * Make conversion script work for Canine-c too * Fix tokenizer tests * Remove file Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
6e685978 · NielsRogge · GitHub · 69f57015 · 6e685978 · 6e685978
Unverified Commit 6e685978 authored Jun 30, 2021 by NielsRogge Committed by GitHub Jun 30, 2021
17 changed files
--- a/README.md
+++ b/README.md
@@ -212,7 +212,8 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
 1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
 1. **[ByT5](https://huggingface.co/transformers/model_doc/byt5.html)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
 1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
-1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** from (OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
+1. **[CANINE](https://huggingface.co/transformers/model_doc/canine.html)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
+1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
 1. **[ConvBERT](https://huggingface.co/transformers/model_doc/convbert.html)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
 1. **[CPM](https://huggingface.co/transformers/model_doc/cpm.html)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
 1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
--- a/docs/source/model_doc/canine.rst
+++ b/docs/source/model_doc/canine.rst
+.. 
+    Copyright 2021 The HuggingFace Team. All rights reserved.
+
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+    the License. You may obtain a copy of the License at
+
+        http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+    specific language governing permissions and limitations under the License.
+
+CANINE
+-----------------------------------------------------------------------------------------------------------------------
+
+Overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The CANINE model was proposed in `CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
+Representation <https://arxiv.org/abs/2103.06874>`__ by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. It's
+among the first papers that trains a Transformer without using an explicit tokenization step (such as Byte Pair
+Encoding (BPE), WordPiece or SentencePiece). Instead, the model is trained directly at a Unicode character-level.
+Training at a character-level inevitably comes with a longer sequence length, which CANINE solves with an efficient
+downsampling strategy, before applying a deep Transformer encoder.
+
+The abstract from the paper is the following:
+
+*Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models
+still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword
+lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all
+languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE,
+a neural encoder that operates directly on character sequences, without explicit tokenization or vocabulary, and a
+pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias.
+To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input
+sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by
+2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.*
+
+Tips:
+
+- CANINE uses no less than 3 Transformer encoders internally: 2 "shallow" encoders (which only consist of a single
+  layer) and 1 "deep" encoder (which is a regular BERT encoder). First, a "shallow" encoder is used to contextualize
+  the character embeddings, using local attention. Next, after downsampling, a "deep" encoder is applied. Finally,
+  after upsampling, a "shallow" encoder is used to create the final character embeddings. Details regarding up- and
+  downsampling can be found in the paper.
+- CANINE uses a max sequence length of 2048 characters by default. One can use :class:`~transformers.CanineTokenizer`
+  to prepare text for the model.
+- Classification can be done by placing a linear layer on top of the final hidden state of the special [CLS] token
+  (which has a predefined Unicode code point). For token classification tasks however, the downsampled sequence of
+  tokens needs to be upsampled again to match the length of the original character sequence (which is 2048). The
+  details for this can be found in the paper.
+
+This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
+<https://github.com/google-research/language/tree/master/language/canine>`__.
+
+
+Example
+_______________________________________________________________________________________________________________________
+
+CANINE works on raw characters, so it can be used without a tokenizer:
+
+.. code-block::
+
+    from transformers import CanineModel
+    import torch
+
+    model = CanineModel.from_pretrained('google/canine-s') # model pre-trained with autoregressive character loss
+
+    text = "hello world"
+    # use Python's built-in ord() function to turn each character into its unicode code point id
+    input_ids = torch.tensor([[ord(char) for char in text]])
+
+    outputs = model(input_ids) # forward pass
+    pooled_output = outputs.pooler_output
+    sequence_output = outputs.last_hidden_state
+
+
+For batched inference and training, it is however recommended to make use of the tokenizer (to pad/truncate all
+sequences to the same length):
+
+.. code-block::
+
+    from transformers import CanineTokenizer, CanineModel
+
+    model = CanineModel.from_pretrained('google/canine-s')
+    tokenizer = CanineTokenizer.from_pretrained('google/canine-s')
+
+    inputs = ["Life is like a box of chocolates.", "You never know what you gonna get."]
+    encoding = tokenizer(inputs, padding="longest", truncation=True, return_tensors="pt")
+
+    outputs = model(**encoding) # forward pass
+    pooled_output = outputs.pooler_output
+    sequence_output = outputs.last_hidden_state
+
+
+CANINE specific outputs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.models.canine.modeling_canine.CanineModelOutputWithPooling
+    :members:
+
+
+CanineConfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineConfig
+    :members:
+
+
+CanineTokenizer
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineTokenizer
+    :members: build_inputs_with_special_tokens, get_special_tokens_mask,
+        create_token_type_ids_from_sequences
+
+
+CanineModel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineModel
+    :members: forward
+
+
+CanineForSequenceClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineForSequenceClassification
+    :members: forward
+
+
+CanineForMultipleChoice
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineForMultipleChoice
+    :members: forward
+
+
+CanineForTokenClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineForTokenClassification
+    :members: forward
+
+
+CanineForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.CanineForQuestionAnswering
+    :members: forward
--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -170,6 +170,7 @@ _import_structure = {
    ],
    "models.byt5": ["ByT5Tokenizer"],
    "models.camembert": ["CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "CamembertConfig"],
+    "models.canine": ["CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP", "CanineConfig", "CanineTokenizer"],
    "models.clip": [
        "CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP",
        "CLIPConfig",
@@ -505,7 +506,6 @@ if is_torch_available():
            "load_tf_weights_in_albert",
        ]
    )
-
    _import_structure["models.auto"].extend(
        [
            "MODEL_FOR_CAUSAL_LM_MAPPING",
@@ -632,6 +632,19 @@ if is_torch_available():
            "CamembertModel",
        ]
    )
+    _import_structure["models.canine"].extend(
+        [
+            "CANINE_PRETRAINED_MODEL_ARCHIVE_LIST",
+            "CanineForMultipleChoice",
+            "CanineForQuestionAnswering",
+            "CanineForSequenceClassification",
+            "CanineForTokenClassification",
+            "CanineLayer",
+            "CanineModel",
+            "CaninePreTrainedModel",
+            "load_tf_weights_in_canine",
+        ]
+    )
    _import_structure["models.clip"].extend(
        [
            "CLIP_PRETRAINED_MODEL_ARCHIVE_LIST",
@@ -1756,6 +1769,7 @@ if TYPE_CHECKING:
    )
    from .models.byt5 import ByT5Tokenizer
    from .models.camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig
+    from .models.canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig, CanineTokenizer
    from .models.clip import (
        CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP,
        CLIPConfig,
@@ -2156,6 +2170,17 @@ if TYPE_CHECKING:
            CamembertForTokenClassification,
            CamembertModel,
        )
+        from .models.canine import (
+            CANINE_PRETRAINED_MODEL_ARCHIVE_LIST,
+            CanineForMultipleChoice,
+            CanineForQuestionAnswering,
+            CanineForSequenceClassification,
+            CanineForTokenClassification,
+            CanineLayer,
+            CanineModel,
+            CaninePreTrainedModel,
+            load_tf_weights_in_canine,
+        )
        from .models.clip import (
            CLIP_PRETRAINED_MODEL_ARCHIVE_LIST,
            CLIPModel,

--- a/src/transformers/models/__init__.py
+++ b/src/transformers/models/__init__.py
@@ -30,6 +30,7 @@ from . import (
    blenderbot,
    blenderbot_small,
    camembert,
+    canine,
    clip,
    convbert,
    cpm,

--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -33,6 +33,7 @@ from ..blenderbot_small.configuration_blenderbot_small import (
    BlenderbotSmallConfig,
 )
 from ..camembert.configuration_camembert import CAMEMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, CamembertConfig
+from ..canine.configuration_canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig
 from ..clip.configuration_clip import CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP, CLIPConfig
 from ..convbert.configuration_convbert import CONVBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, ConvBertConfig
 from ..ctrl.configuration_ctrl import CTRL_PRETRAINED_CONFIG_ARCHIVE_MAP, CTRLConfig
@@ -96,6 +97,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
    for pretrained_map in [
        # Add archive maps here
        VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
+        CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP,
        ROFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
        CLIP_PRETRAINED_CONFIG_ARCHIVE_MAP,
        BIGBIRD_PEGASUS_PRETRAINED_CONFIG_ARCHIVE_MAP,
@@ -155,6 +157,7 @@ CONFIG_MAPPING = OrderedDict(
    [
        # Add configs here
        ("visual_bert", VisualBertConfig),
+        ("canine", CanineConfig),
        ("roformer", RoFormerConfig),
        ("clip", CLIPConfig),
        ("bigbird_pegasus", BigBirdPegasusConfig),
@@ -220,6 +223,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
    [
        # Add full (and cased) model names here
        ("visual_bert", "VisualBert"),
+        ("canine", "Canine"),
        ("roformer", "RoFormer"),
        ("clip", "CLIP"),
        ("bigbird_pegasus", "BigBirdPegasus"),

--- a/src/transformers/models/auto/modeling_auto.py
+++ b/src/transformers/models/auto/modeling_auto.py
@@ -81,6 +81,13 @@ from ..camembert.modeling_camembert import (
    CamembertForTokenClassification,
    CamembertModel,
 )
+from ..canine.modeling_canine import (
+    CanineForMultipleChoice,
+    CanineForQuestionAnswering,
+    CanineForSequenceClassification,
+    CanineForTokenClassification,
+    CanineModel,
+)
 from ..clip.modeling_clip import CLIPModel
 from ..convbert.modeling_convbert import (
    ConvBertForMaskedLM,
@@ -312,6 +319,7 @@ from .configuration_auto import (
    BlenderbotConfig,
    BlenderbotSmallConfig,
    CamembertConfig,
+    CanineConfig,
    CLIPConfig,
    ConvBertConfig,
    CTRLConfig,
@@ -371,6 +379,7 @@ MODEL_MAPPING = OrderedDict(
    [
        # Base model mapping
        (VisualBertConfig, VisualBertModel),
+        (CanineConfig, CanineModel),
        (RoFormerConfig, RoFormerModel),
        (CLIPConfig, CLIPModel),
        (BigBirdPegasusConfig, BigBirdPegasusModel),
@@ -624,6 +633,7 @@ MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING = OrderedDict(
 MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
    [
        # Model for Sequence Classification mapping
+        (CanineConfig, CanineForSequenceClassification),
        (RoFormerConfig, RoFormerForSequenceClassification),
        (BigBirdPegasusConfig, BigBirdPegasusForSequenceClassification),
        (BigBirdConfig, BigBirdForSequenceClassification),
@@ -664,6 +674,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
 MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
    [
        # Model for Question Answering mapping
+        (CanineConfig, CanineForQuestionAnswering),
        (RoFormerConfig, RoFormerForQuestionAnswering),
        (BigBirdPegasusConfig, BigBirdPegasusForQuestionAnswering),
        (BigBirdConfig, BigBirdForQuestionAnswering),
@@ -705,6 +716,7 @@ MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING = OrderedDict(
 MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
    [
        # Model for Token Classification mapping
+        (CanineConfig, CanineForTokenClassification),
        (RoFormerConfig, RoFormerForTokenClassification),
        (BigBirdConfig, BigBirdForTokenClassification),
        (ConvBertConfig, ConvBertForTokenClassification),
@@ -735,6 +747,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
 MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
    [
        # Model for Multiple Choice mapping
+        (CanineConfig, CanineForMultipleChoice),
        (RoFormerConfig, RoFormerForMultipleChoice),
        (BigBirdConfig, BigBirdForMultipleChoice),
        (ConvBertConfig, ConvBertForMultipleChoice),

--- a/src/transformers/models/auto/tokenization_auto.py
+++ b/src/transformers/models/auto/tokenization_auto.py
@@ -37,6 +37,7 @@ from ..bertweet.tokenization_bertweet import BertweetTokenizer
 from ..blenderbot.tokenization_blenderbot import BlenderbotTokenizer
 from ..blenderbot_small.tokenization_blenderbot_small import BlenderbotSmallTokenizer
 from ..byt5.tokenization_byt5 import ByT5Tokenizer
+from ..canine.tokenization_canine import CanineTokenizer
 from ..convbert.tokenization_convbert import ConvBertTokenizer
 from ..ctrl.tokenization_ctrl import CTRLTokenizer
 from ..deberta.tokenization_deberta import DebertaTokenizer
@@ -78,6 +79,7 @@ from .configuration_auto import (
    BlenderbotConfig,
    BlenderbotSmallConfig,
    CamembertConfig,
+    CanineConfig,
    ConvBertConfig,
    CTRLConfig,
    DebertaConfig,
@@ -294,6 +296,7 @@ TOKENIZER_MAPPING = OrderedDict(
        (GPTNeoConfig, (GPT2Tokenizer, GPT2TokenizerFast)),
        (LukeConfig, (LukeTokenizer, None)),
        (BigBirdPegasusConfig, (PegasusTokenizer, PegasusTokenizerFast)),
+        (CanineConfig, (CanineTokenizer, None)),
    ]
 )


--- a/src/transformers/models/canine/__init__.py
+++ b/src/transformers/models/canine/__init__.py
+# flake8: noqa
+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+
+from ...file_utils import _BaseLazyModule, is_tokenizers_available, is_torch_available
+
+
+_import_structure = {
+    "configuration_canine": ["CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP", "CanineConfig"],
+    "tokenization_canine": ["CanineTokenizer"],
+}
+
+if is_torch_available():
+    _import_structure["modeling_canine"] = [
+        "CANINE_PRETRAINED_MODEL_ARCHIVE_LIST",
+        "CanineForMultipleChoice",
+        "CanineForQuestionAnswering",
+        "CanineForSequenceClassification",
+        "CanineForTokenClassification",
+        "CanineLayer",
+        "CanineModel",
+        "CaninePreTrainedModel",
+        "load_tf_weights_in_canine",
+    ]
+
+
+if TYPE_CHECKING:
+    from .configuration_canine import CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP, CanineConfig
+    from .tokenization_canine import CanineTokenizer
+
+    if is_torch_available():
+        from .modeling_canine import (
+            CANINE_PRETRAINED_MODEL_ARCHIVE_LIST,
+            CanineForMultipleChoice,
+            CanineForQuestionAnswering,
+            CanineForSequenceClassification,
+            CanineForTokenClassification,
+            CanineLayer,
+            CanineModel,
+            CaninePreTrainedModel,
+            load_tf_weights_in_canine,
+        )
+
+
+else:
+    import importlib
+    import os
+    import sys
+
+    class _LazyModule(_BaseLazyModule):
+        """
+        Module class that surfaces all objects but only performs associated imports when the objects are requested.
+        """
+
+        __file__ = globals()["__file__"]
+        __path__ = [os.path.dirname(__file__)]
+
+        def _get_module(self, module_name: str):
+            return importlib.import_module("." + module_name, self.__name__)
+
+    sys.modules[__name__] = _LazyModule(__name__, _import_structure)
--- a/src/transformers/models/canine/configuration_canine.py
+++ b/src/transformers/models/canine/configuration_canine.py
+# coding=utf-8
+# Copyright Google AI and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" CANINE model configuration """
+
+from ...configuration_utils import PretrainedConfig
+from ...utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+CANINE_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "google/canine-s": "https://huggingface.co/google/canine-s/resolve/main/config.json",
+    # See all CANINE models at https://huggingface.co/models?filter=canine
+}
+
+
+class CanineConfig(PretrainedConfig):
+    r"""
+    This is the configuration class to store the configuration of a :class:`~transformers.CanineModel`. It is used to
+    instantiate an CANINE model according to the specified arguments, defining the model architecture. Instantiating a
+    configuration with the defaults will yield a similar configuration to that of the CANINE `google/canine-s
+    <https://huggingface.co/google/canine-s>`__ architecture.
+
+    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
+    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+
+
+    Args:
+        hidden_size (:obj:`int`, `optional`, defaults to 768):
+            Dimension of the encoder layers and the pooler layer.
+        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the deep Transformer encoder.
+        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoders.
+        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
+            Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoders.
+        hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string,
+            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
+        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout probabilitiy for all fully connected layers in the embeddings, encoders, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 16384):
+            The maximum sequence length that this model might ever be used with.
+        type_vocab_size (:obj:`int`, `optional`, defaults to 16):
+            The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.CanineModel`.
+        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
+        downsampling_rate (:obj:`int`, `optional`, defaults to 4):
+            The rate at which to downsample the original character sequence length before applying the deep Transformer
+            encoder.
+        upsampling_kernel_size (:obj:`int`, `optional`, defaults to 4):
+            The kernel size (i.e. the number of characters in each window) of the convolutional projection layer when
+            projecting back from :obj:`hidden_size`*2 to :obj:`hidden_size`.
+        num_hash_functions (:obj:`int`, `optional`, defaults to 8):
+            The number of hash functions to use. Each hash function has its own embedding matrix.
+        num_hash_buckets (:obj:`int`, `optional`, defaults to 16384):
+            The number of hash buckets to use.
+        local_transformer_stride (:obj:`int`, `optional`, defaults to 128):
+            The stride of the local attention of the first shallow Transformer encoder. Defaults to 128 for good
+            TPU/XLA memory alignment.
+
+    Example::
+
+        >>> from transformers import CanineModel, CanineConfig
+
+        >>> # Initializing a CANINE google/canine-s style configuration
+        >>> configuration = CanineConfig()
+
+        >>> # Initializing a model from the google/canine-s style configuration
+        >>> model = CanineModel(configuration)
+
+        >>> # Accessing the model configuration
+        >>> configuration = model.config
+    """
+    model_type = "canine"
+
+    def __init__(
+        self,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=16384,
+        type_vocab_size=16,
+        initializer_range=0.02,
+        layer_norm_eps=1e-12,
+        use_cache=True,
+        is_encoder_decoder=False,
+        pad_token_id=0,
+        bos_token_id=0xE000,
+        eos_token_id=0xE001,
+        downsampling_rate=4,
+        upsampling_kernel_size=4,
+        num_hash_functions=8,
+        num_hash_buckets=16384,
+        local_transformer_stride=128,  # Good TPU/XLA memory alignment.
+        **kwargs
+    ):
+        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
+
+        self.max_position_embeddings = max_position_embeddings
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.intermediate_size = intermediate_size
+        self.hidden_act = hidden_act
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.initializer_range = initializer_range
+        self.type_vocab_size = type_vocab_size
+        self.layer_norm_eps = layer_norm_eps
+        self.use_cache = use_cache
+
+        # Character config:
+        self.downsampling_rate = downsampling_rate
+        self.upsampling_kernel_size = upsampling_kernel_size
+        self.num_hash_functions = num_hash_functions
+        self.num_hash_buckets = num_hash_buckets
+        self.local_transformer_stride = local_transformer_stride
--- a/src/transformers/models/canine/convert_canine_original_tf_checkpoint_to_pytorch.py
+++ b/src/transformers/models/canine/convert_canine_original_tf_checkpoint_to_pytorch.py
+# coding=utf-8
+# Copyright 2021 The HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Convert CANINE checkpoint."""
+
+
+import argparse
+
+from transformers import CanineConfig, CanineModel, CanineTokenizer, load_tf_weights_in_canine
+from transformers.utils import logging
+
+
+logging.set_verbosity_info()
+
+
+def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, pytorch_dump_path):
+
+    # Initialize PyTorch model
+    config = CanineConfig()
+    model = CanineModel(config)
+    model.eval()
+
+    print(f"Building PyTorch model from configuration: {config}")
+
+    # Load weights from tf checkpoint
+    load_tf_weights_in_canine(model, config, tf_checkpoint_path)
+
+    # Save pytorch-model (weights and configuration)
+    print(f"Save PyTorch model to {pytorch_dump_path}")
+    model.save_pretrained(pytorch_dump_path)
+
+    # Save tokenizer files
+    tokenizer = CanineTokenizer()
+    print(f"Save tokenizer files to {pytorch_dump_path}")
+    tokenizer.save_pretrained(pytorch_dump_path)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    # Required parameters
+    parser.add_argument(
+        "--tf_checkpoint_path",
+        default=None,
+        type=str,
+        required=True,
+        help="Path to the TensorFlow checkpoint. Should end with model.ckpt",
+    )
+    parser.add_argument(
+        "--pytorch_dump_path",
+        default=None,
+        type=str,
+        required=True,
+        help="Path to a folder where the PyTorch model will be placed.",
+    )
+    args = parser.parse_args()
+    convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.pytorch_dump_path)
--- a/src/transformers/models/canine/modeling_canine.py
+++ b/src/transformers/models/canine/modeling_canine.py
--- a/src/transformers/models/canine/tokenization_canine.py
+++ b/src/transformers/models/canine/tokenization_canine.py
+# coding=utf-8
+# Copyright Google AI and The HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Tokenization classes for CANINE."""
+
+from typing import Dict, List, Optional
+
+from ...tokenization_utils import AddedToken, PreTrainedTokenizer
+from ...utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+
+PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
+    "nielsr/canine-s": 2048,
+}
+
+# Unicode defines 1,114,112 total “codepoints”
+UNICODE_VOCAB_SIZE = 1114112
+
+# Below: Constants defining canonical codepoints for special, pseudo-characters.
+# Copied from https://github.com/google-research/language/blob/master/language/canine/special_codepoints.py
+PAD = 0
+
+CLS = 0xE000
+SEP = 0xE001
+BOS = 0xE002
+MASK = 0xE003
+RESERVED = 0xE004
+
+# Maps special codepoints to human-readable names.
+SPECIAL_CODEPOINTS: Dict[int, str] = {
+    # Special symbols are represented using codepoints values that are valid,
+    # but designated as "Private Use", meaning that they will never be assigned
+    # characters by the Unicode Consortium, and are thus safe for use here.
+    #
+    # NOTE: Do *NOT* add any sort of [UNK_CHAR] here. They are explicitly
+    # excluded and should fail with a hard error.
+    CLS: "[CLS]",
+    SEP: "[SEP]",
+    BOS: "[BOS]",
+    MASK: "[MASK]",
+    PAD: "[PAD]",
+    RESERVED: "[RESERVED]",
+}
+
+# Maps special codepoint human-readable names to their codepoint values.
+SPECIAL_CODEPOINTS_BY_NAME: Dict[str, int] = {name: codepoint for codepoint, name in SPECIAL_CODEPOINTS.items()}
+
+
+class CanineTokenizer(PreTrainedTokenizer):
+    r"""
+    Construct a CANINE tokenizer (i.e. a character splitter). It turns text into a sequence of characters, and then
+    converts each character into its Unicode code point.
+
+    :class:`~transformers.CanineTokenizer` inherits from :class:`~transformers.PreTrainedTokenizer`.
+
+    Refer to superclass :class:`~transformers.PreTrainedTokenizer` for usage examples and documentation concerning
+    parameters.
+
+    Args:
+        model_max_length (:obj:`int`, `optional`, defaults to 2048):
+                The maximum sentence length the model accepts.
+    """
+
+    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
+
+    def __init__(
+        self,
+        bos_token=chr(CLS),
+        eos_token=chr(SEP),
+        sep_token=chr(SEP),
+        cls_token=chr(CLS),
+        pad_token=chr(PAD),
+        mask_token=chr(MASK),
+        add_prefix_space=False,
+        model_max_length=2048,
+        **kwargs
+    ):
+        bos_token = AddedToken(bos_token, lstrip=False, rstrip=False) if isinstance(bos_token, str) else bos_token
+        eos_token = AddedToken(eos_token, lstrip=False, rstrip=False) if isinstance(eos_token, str) else eos_token
+        sep_token = AddedToken(sep_token, lstrip=False, rstrip=False) if isinstance(sep_token, str) else sep_token
+        cls_token = AddedToken(cls_token, lstrip=False, rstrip=False) if isinstance(cls_token, str) else cls_token
+        pad_token = AddedToken(pad_token, lstrip=False, rstrip=False) if isinstance(pad_token, str) else pad_token
+
+        # Mask token behave like a normal word, i.e. include the space before it
+        mask_token = AddedToken(mask_token, lstrip=True, rstrip=False) if isinstance(mask_token, str) else mask_token
+
+        super().__init__(
+            bos_token=bos_token,
+            eos_token=eos_token,
+            sep_token=sep_token,
+            cls_token=cls_token,
+            pad_token=pad_token,
+            mask_token=mask_token,
+            add_prefix_space=add_prefix_space,
+            model_max_length=model_max_length,
+            **kwargs,
+        )
+
+        # Creates a mapping for looking up the IDs of special symbols.
+        self._special_codepoints: Dict[str, int] = {}
+        for codepoint, name in SPECIAL_CODEPOINTS.items():
+            self._special_codepoints[name] = codepoint
+
+        # Creates a mapping for looking up the string forms of special symbol IDs.
+        self._special_codepoint_strings: Dict[int, str] = {
+            codepoint: name for name, codepoint in self._special_codepoints.items()
+        }
+
+        self._unicode_vocab_size = UNICODE_VOCAB_SIZE
+        self._num_special_tokens = len(self._special_codepoints)
+
+    @property
+    def vocab_size(self) -> int:
+        return self._unicode_vocab_size
+
+    def _tokenize(self, text: str) -> List[str]:
+        """Tokenize a string (i.e. perform character splitting)."""
+        return list(text)
+
+    def _convert_token_to_id(self, token: str) -> int:
+        """Converts a token (i.e. a Unicode character) in an id (i.e. its integer Unicode code point value)."""
+        try:
+            return ord(token)
+        except TypeError:
+            raise ValueError(f"invalid token: '{token}'")
+
+    def _convert_id_to_token(self, index: int) -> str:
+        """
+        Converts a Unicode code point (integer) in a token (str). In case it's a special code point, convert to
+        human-readable format.
+        """
+        try:
+            if index in SPECIAL_CODEPOINTS:
+                return SPECIAL_CODEPOINTS[index]
+            return chr(index)
+        except TypeError:
+            raise ValueError(f"invalid id: {index}")
+
+    def convert_tokens_to_string(self, tokens):
+        return "".join(tokens)
+
+    def build_inputs_with_special_tokens(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. A CANINE sequence has the following format:
+
+        - single sequence: ``[CLS] X [SEP]``
+        - pair of sequences: ``[CLS] A [SEP] B [SEP]``
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+        """
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+
+        result = cls + token_ids_0 + sep
+        if token_ids_1 is not None:
+            result += token_ids_1 + sep
+        return result
+
+    def get_special_tokens_mask(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
+    ) -> List[int]:
+        """
+        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
+        special tokens using the tokenizer ``prepare_for_model`` method.
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
+                Whether or not the token list is already formatted with special tokens for the model.
+
+        Returns:
+            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+        """
+        if already_has_special_tokens:
+            return super().get_special_tokens_mask(
+                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
+            )
+
+        result = [1] + ([0] * len(token_ids_0)) + [1]
+        if token_ids_1 is not None:
+            result += ([0] * len(token_ids_1)) + [1]
+        return result
+
+    def create_token_type_ids_from_sequences(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Create a mask from the two sequences passed to be used in a sequence-pair classification task. A CANINE
+        sequence pair mask has the following format:
+
+        ::
+
+            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+            | first sequence    | second sequence |
+
+        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
+
+        Args:
+            token_ids_0 (:obj:`List[int]`):
+                List of IDs.
+            token_ids_1 (:obj:`List[int]`, `optional`):
+                Optional second list of IDs for sequence pairs.
+
+        Returns:
+            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
+            sequence(s).
+        """
+        sep = [self.sep_token_id]
+        cls = [self.cls_token_id]
+
+        result = len(cls + token_ids_0 + sep) * [0]
+        if token_ids_1 is not None:
+            result += len(token_ids_1 + sep) * [1]
+        return result
+
+    # CanineTokenizer has no vocab file
+    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None):
+        return ()
--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
@@ -995,6 +995,72 @@ class CamembertModel:
        requires_backends(cls, ["torch"])


+CANINE_PRETRAINED_MODEL_ARCHIVE_LIST = None
+
+
+class CanineForMultipleChoice:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+class CanineForQuestionAnswering:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+class CanineForSequenceClassification:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+class CanineForTokenClassification:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+class CanineLayer:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+
+class CanineModel:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+class CaninePreTrainedModel:
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["torch"])
+
+    @classmethod
+    def from_pretrained(cls, *args, **kwargs):
+        requires_backends(cls, ["torch"])
+
+
+def load_tf_weights_in_canine(*args, **kwargs):
+    requires_backends(load_tf_weights_in_canine, ["torch"])
+
+
 CLIP_PRETRAINED_MODEL_ARCHIVE_LIST = None



--- a/src/transformers/utils/modeling_auto_mapping.py
+++ b/src/transformers/utils/modeling_auto_mapping.py
@@ -6,6 +6,7 @@ from collections import OrderedDict

 MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
    [
+        ("CanineConfig", "CanineForQuestionAnswering"),
        ("RoFormerConfig", "RoFormerForQuestionAnswering"),
        ("BigBirdPegasusConfig", "BigBirdPegasusForQuestionAnswering"),
        ("BigBirdConfig", "BigBirdForQuestionAnswering"),
@@ -112,6 +113,7 @@ MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict(

 MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
    [
+        ("CanineConfig", "CanineForMultipleChoice"),
        ("RoFormerConfig", "RoFormerForMultipleChoice"),
        ("BigBirdConfig", "BigBirdForMultipleChoice"),
        ("ConvBertConfig", "ConvBertForMultipleChoice"),
@@ -175,6 +177,7 @@ MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES = OrderedDict(

 MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
    [
+        ("CanineConfig", "CanineForSequenceClassification"),
        ("RoFormerConfig", "RoFormerForSequenceClassification"),
        ("BigBirdPegasusConfig", "BigBirdPegasusForSequenceClassification"),
        ("BigBirdConfig", "BigBirdForSequenceClassification"),
@@ -222,6 +225,7 @@ MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(

 MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
    [
+        ("CanineConfig", "CanineForTokenClassification"),
        ("RoFormerConfig", "RoFormerForTokenClassification"),
        ("BigBirdConfig", "BigBirdForTokenClassification"),
        ("ConvBertConfig", "ConvBertForTokenClassification"),
@@ -252,6 +256,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
 MODEL_MAPPING_NAMES = OrderedDict(
    [
        ("VisualBertConfig", "VisualBertModel"),
+        ("CanineConfig", "CanineModel"),
        ("RoFormerConfig", "RoFormerModel"),
        ("CLIPConfig", "CLIPModel"),
        ("BigBirdPegasusConfig", "BigBirdPegasusModel"),

--- a/tests/test_modeling_canine.py
+++ b/tests/test_modeling_canine.py
--- a/tests/test_tokenization_canine.py
+++ b/tests/test_tokenization_canine.py
+# coding=utf-8
+# Copyright 2021 Google AI and HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import shutil
+import tempfile
+import unittest
+
+from transformers import BatchEncoding, CanineTokenizer
+from transformers.file_utils import cached_property
+from transformers.testing_utils import require_tokenizers, require_torch
+from transformers.tokenization_utils import AddedToken
+
+from .test_tokenization_common import TokenizerTesterMixin
+
+
+class CanineTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
+
+    tokenizer_class = CanineTokenizer
+    test_rust_tokenizer = False
+
+    def setUp(self):
+        super().setUp()
+        tokenizer = CanineTokenizer()
+        tokenizer.save_pretrained(self.tmpdirname)
+
+    @cached_property
+    def canine_tokenizer(self):
+        # TODO replace nielsr by google
+        return CanineTokenizer.from_pretrained("nielsr/canine-s")
+
+    def get_tokenizer(self, **kwargs) -> CanineTokenizer:
+        return self.tokenizer_class.from_pretrained(self.tmpdirname, **kwargs)
+
+    @require_torch
+    def test_prepare_batch_integration(self):
+        tokenizer = self.canine_tokenizer
+        src_text = ["Life is like a box of chocolates.", "You never know what you're gonna get."]
+        # fmt: off
+        expected_src_tokens = [57344, 76, 105, 102, 101, 32, 105, 115, 32, 108, 105, 107, 101, 32, 97, 32, 98, 111, 120, 32, 111, 102, 32, 99, 104, 111, 99, 111, 108, 97, 116, 101, 115, 46, 57345, 0, 0, 0, 0]
+        # fmt: on
+        batch = tokenizer(src_text, padding=True, return_tensors="pt")
+        self.assertIsInstance(batch, BatchEncoding)
+
+        result = list(batch.input_ids.numpy()[0])
+
+        self.assertListEqual(expected_src_tokens, result)
+
+        self.assertEqual((2, 39), batch.input_ids.shape)
+        self.assertEqual((2, 39), batch.attention_mask.shape)
+
+    @require_torch
+    def test_encoding_keys(self):
+        tokenizer = self.canine_tokenizer
+        src_text = ["Once there was a man.", "He wrote a test in HuggingFace Tranformers."]
+        batch = tokenizer(src_text, padding=True, return_tensors="pt")
+        # check if input_ids, attention_mask and token_type_ids are returned
+        self.assertIn("input_ids", batch)
+        self.assertIn("attention_mask", batch)
+        self.assertIn("token_type_ids", batch)
+
+    @require_torch
+    def test_max_length_integration(self):
+        tokenizer = self.canine_tokenizer
+        tgt_text = [
+            "What's the weater?",
+            "It's about 25 degrees.",
+        ]
+        with tokenizer.as_target_tokenizer():
+            targets = tokenizer(tgt_text, max_length=32, padding="max_length", truncation=True, return_tensors="pt")
+        self.assertEqual(32, targets["input_ids"].shape[1])
+
+    # cannot use default save_and_load_tokenzier test method because tokenzier has no vocab
+    def test_save_and_load_tokenizer(self):
+        # safety check on max_len default value so we are sure the test works
+        tokenizers = self.get_tokenizers()
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+                self.assertNotEqual(tokenizer.model_max_length, 42)
+
+        # Now let's start the test
+        tokenizers = self.get_tokenizers()
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+                # Isolate this from the other tests because we save additional tokens/etc
+                tmpdirname = tempfile.mkdtemp()
+
+                sample_text = " He is very happy, UNwant\u00E9d,running"
+                before_tokens = tokenizer.encode(sample_text, add_special_tokens=False)
+                tokenizer.save_pretrained(tmpdirname)
+
+                after_tokenizer = tokenizer.__class__.from_pretrained(tmpdirname)
+                after_tokens = after_tokenizer.encode(sample_text, add_special_tokens=False)
+                self.assertListEqual(before_tokens, after_tokens)
+
+                shutil.rmtree(tmpdirname)
+
+        tokenizers = self.get_tokenizers(model_max_length=42)
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+                # Isolate this from the other tests because we save additional tokens/etc
+                tmpdirname = tempfile.mkdtemp()
+
+                sample_text = " He is very happy, UNwant\u00E9d,running"
+
+                additional_special_tokens = tokenizer.additional_special_tokens
+
+                # We can add a new special token for Canine as follows:
+                new_additional_special_token = chr(0xE007)
+                additional_special_tokens.append(new_additional_special_token)
+                tokenizer.add_special_tokens({"additional_special_tokens": additional_special_tokens})
+                before_tokens = tokenizer.encode(sample_text, add_special_tokens=False)
+                tokenizer.save_pretrained(tmpdirname)
+
+                after_tokenizer = tokenizer.__class__.from_pretrained(tmpdirname)
+                after_tokens = after_tokenizer.encode(sample_text, add_special_tokens=False)
+                self.assertListEqual(before_tokens, after_tokens)
+                self.assertIn(new_additional_special_token, after_tokenizer.additional_special_tokens)
+                self.assertEqual(after_tokenizer.model_max_length, 42)
+
+                tokenizer = tokenizer.__class__.from_pretrained(tmpdirname, model_max_length=43)
+                self.assertEqual(tokenizer.model_max_length, 43)
+
+                shutil.rmtree(tmpdirname)
+
+    def test_add_special_tokens(self):
+        tokenizers = self.get_tokenizers(do_lower_case=False)
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+                input_text, ids = self.get_clean_sequence(tokenizer)
+
+                # a special token for Canine can be defined as follows:
+                SPECIAL_TOKEN = 0xE005
+                special_token = chr(SPECIAL_TOKEN)
+
+                tokenizer.add_special_tokens({"cls_token": special_token})
+                encoded_special_token = tokenizer.encode(special_token, add_special_tokens=False)
+                self.assertEqual(len(encoded_special_token), 1)
+
+                text = tokenizer.decode(ids + encoded_special_token, clean_up_tokenization_spaces=False)
+                encoded = tokenizer.encode(text, add_special_tokens=False)
+
+                input_encoded = tokenizer.encode(input_text, add_special_tokens=False)
+                special_token_id = tokenizer.encode(special_token, add_special_tokens=False)
+                self.assertEqual(encoded, input_encoded + special_token_id)
+
+                decoded = tokenizer.decode(encoded, skip_special_tokens=True)
+                self.assertTrue(special_token not in decoded)
+
+    @require_tokenizers
+    def test_added_token_serializable(self):
+        tokenizers = self.get_tokenizers(do_lower_case=False)
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+
+                # a special token for Canine can be defined as follows:
+                NEW_TOKEN = 0xE006
+                new_token = chr(NEW_TOKEN)
+
+                new_token = AddedToken(new_token, lstrip=True)
+                tokenizer.add_special_tokens({"additional_special_tokens": [new_token]})
+
+                with tempfile.TemporaryDirectory() as tmp_dir_name:
+                    tokenizer.save_pretrained(tmp_dir_name)
+                    tokenizer.from_pretrained(tmp_dir_name)
+
+    @require_tokenizers
+    def test_encode_decode_with_spaces(self):
+        tokenizers = self.get_tokenizers(do_lower_case=False)
+        for tokenizer in tokenizers:
+            with self.subTest(f"{tokenizer.__class__.__name__}"):
+
+                input = "hello world"
+                if self.space_between_special_tokens:
+                    output = "[CLS] hello world [SEP]"
+                else:
+                    output = input
+                encoded = tokenizer.encode(input, add_special_tokens=False)
+                decoded = tokenizer.decode(encoded, spaces_between_special_tokens=self.space_between_special_tokens)
+                self.assertIn(decoded, [output, output.lower()])
+
+    # tokenizer has a fixed vocab_size (namely all possible unicode code points)
+    def test_add_tokens_tokenizer(self):
+        pass
+
+    # CanineTokenizer does not support do_lower_case = True, as each character has its own Unicode code point
+    # ("b" and "B" for example have different Unicode code points)
+    def test_added_tokens_do_lower_case(self):
+        pass
+
+    # CanineModel does not support the get_input_embeddings nor the get_vocab method
+    def test_np_encode_plus_sent_to_model(self):
+        pass
+
+    # CanineModel does not support the get_input_embeddings nor the get_vocab method
+    def test_torch_encode_plus_sent_to_model(self):
+        pass
+
+    # tokenizer can be instantiated without any pretrained files, so no need for pretrained tokenizer list
+    def test_pretrained_model_lists(self):
+        pass
+
+    # tokenizer does not have vocabulary
+    def test_get_vocab(self):
+        pass
+
+    # inputs cannot be pretokenized since ids depend on whole input string and not just on single characters
+    def test_pretokenized_inputs(self):
+        pass
+
+    # tests all ids in vocab => vocab doesn't exist so unnecessary to test
+    def test_conversion_reversible(self):
+        pass