I-BERT model support (#10153)

* IBertConfig, IBertTokentizer added * IBert Model names moified * tokenizer bugfix * embedding -> QuantEmbedding * quant utils added * quant_mode added to configuration * QuantAct added, Embedding layer + QuantAct addition * QuantAct added * unused path removed, QKV quantized * self attention layer all quantized, except softmax * temporarl commit * all liner layers quantized * quant_utils bugfix * bugfix: requantization missing * IntGELU added * IntSoftmax added * LayerNorm implemented * LayerNorm implemented all * names changed: roberta->ibert * config not inherit from ROberta * No support for CausalLM * static quantization added, quantize_model.py removed * import modules uncommented * copyrights fixed * minor bugfix * quant_modules, quant_utils merged as one file * import * fixed * unused runfile removed * make style run * configutration.py docstring fixed * refactoring: comments removed, function name fixed * unused dependency removed * typo fixed * comments(Copied from), assertion string added * refactoring: super(..) -> super(), etc. * refactoring * refarctoring * make style * refactoring * cuda -> to(x.device) * weight initialization removed * QuantLinear set_param removed * QuantEmbedding set_param removed * IntLayerNorm set_param removed * assert string added * assertion error message fixed * is_decoder removed * enc-dec arguments/functions removed * Converter removed * quant_modules docstring fixed * conver_slow_tokenizer rolled back * quant_utils docstring fixed * unused aruments e.g. use_cache removed from config * weight initialization condition fixed * x_min, x_max initialized with small values to avoid div-zero exceptions * testing code for ibert * test emb, linear, gelu, softmax added * test ln and act added * style reformatted * force_dequant added * error tests overrided * make style * Style + Docs * force dequant tests added * Fix fast tokenizer in init * Fix doc * Remove space * docstring, IBertConfig, chunk_size * test_modeling_ibert refactoring * quant_modules.py refactoring * e2e integration test added * tokenizers removed * IBertConfig added to tokenizer_auto.py * bugfix * fix docs & test * fix style num 2 * final fixes Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

I-BERT model support (#10153)
* IBertConfig, IBertTokentizer added * IBert Model names moified * tokenizer bugfix * embedding -> QuantEmbedding * quant utils added * quant_mode added to configuration * QuantAct added, Embedding layer + QuantAct addition * QuantAct added * unused path removed, QKV quantized * self attention layer all quantized, except softmax * temporarl commit * all liner layers quantized * quant_utils bugfix * bugfix: requantization missing * IntGELU added * IntSoftmax added * LayerNorm implemented * LayerNorm implemented all * names changed: roberta->ibert * config not inherit from ROberta * No support for CausalLM * static quantization added, quantize_model.py removed * import modules uncommented * copyrights fixed * minor bugfix * quant_modules, quant_utils merged as one file * import * fixed * unused runfile removed * make style run * configutration.py docstring fixed * refactoring: comments removed, function name fixed * unused dependency removed * typo fixed * comments(Copied from), assertion string added * refactoring: super(..) -> super(), etc. * refactoring * refarctoring * make style * refactoring * cuda -> to(x.device) * weight initialization removed * QuantLinear set_param removed * QuantEmbedding set_param removed * IntLayerNorm set_param removed * assert string added * assertion error message fixed * is_decoder removed * enc-dec arguments/functions removed * Converter removed * quant_modules docstring fixed * conver_slow_tokenizer rolled back * quant_utils docstring fixed * unused aruments e.g. use_cache removed from config * weight initialization condition fixed * x_min, x_max initialized with small values to avoid div-zero exceptions * testing code for ibert * test emb, linear, gelu, softmax added * test ln and act added * style reformatted * force_dequant added * error tests overrided * make style * Style + Docs * force dequant tests added * Fix fast tokenizer in init * Fix doc * Remove space * docstring, IBertConfig, chunk_size * test_modeling_ibert refactoring * quant_modules.py refactoring * e2e integration test added * tokenizers removed * IBertConfig added to tokenizer_auto.py * bugfix * fix docs & test * fix style num 2 * final fixes Co-authored-by: Sehoon Kim <sehoonkim@berkeley.edu> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
63645b3b · Sehoon Kim · GitHub · cb38ffcc · 63645b3b · 63645b3b
Unverified Commit 63645b3b authored Feb 26, 2021 by Sehoon Kim Committed by GitHub Feb 25, 2021
12 changed files
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -263,6 +263,8 @@ TensorFlow and/or Flax.
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |     Funnel Transformer      |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
+|           I-BERT            |       ❌       |       ❌       |       ✅        |         ❌         |      ❌      |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |             LED             |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
 +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
 |           LXMERT            |       ✅       |       ✅       |       ✅        |         ✅         |      ❌      |
@@ -405,6 +407,7 @@ TensorFlow and/or Flax.
    model_doc/fsmt
    model_doc/funnel
    model_doc/herbert
+    model_doc/ibert
    model_doc/layoutlm
    model_doc/led
    model_doc/longformer

--- a/docs/source/model_doc/ibert.rst
+++ b/docs/source/model_doc/ibert.rst
+.. 
+    Copyright 2020 The HuggingFace Team. All rights reserved.
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+    the License. You may obtain a copy of the License at
+        http://www.apache.org/licenses/LICENSE-2.0
+    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+    specific language governing permissions and limitations under the License.
+I-BERT
+-----------------------------------------------------------------------------------------------------------------------
+Overview
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The I-BERT model was proposed in `I-BERT: Integer-only BERT Quantization <https://arxiv.org/abs/2006.10220>`__ by
+Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
+inference up to four times faster.
+The abstract from the paper is the following:
+*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
+Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
+efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
+previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
+efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
+processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
+the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
+nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
+inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
+RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
+the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
+INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
+been open-sourced.*
+The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
+IBertConfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertConfig
+    :members:
+IBertModel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertModel
+    :members: forward
+IBertForMaskedLM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertForMaskedLM
+    :members: forward
+IBertForSequenceClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertForSequenceClassification
+    :members: forward
+IBertForMultipleChoice
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertForMultipleChoice
+    :members: forward
+IBertForTokenClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertForTokenClassification
+    :members: forward
+IBertForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.IBertForQuestionAnswering
+    :members: forward
--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -182,6 +182,7 @@ _import_structure = {
    "models.funnel": ["FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP", "FunnelConfig", "FunnelTokenizer"],
    "models.gpt2": ["GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPT2Config", "GPT2Tokenizer"],
    "models.herbert": ["HerbertTokenizer"],
+    "models.ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
    "models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"],
    "models.led": ["LED_PRETRAINED_CONFIG_ARCHIVE_MAP", "LEDConfig", "LEDTokenizer"],
    "models.longformer": ["LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "LongformerConfig", "LongformerTokenizer"],
@@ -613,6 +614,20 @@ if is_torch_available():
            "load_tf_weights_in_gpt2",
        ]
    )
+    _import_structure["models.ibert"].extend(
+        [
+            "IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
+            "IBertForMaskedLM",
+            "IBertForMultipleChoice",
+            "IBertForQuestionAnswering",
+            "IBertForSequenceClassification",
+            "IBertForTokenClassification",
+            "IBertLayer",
+            "IBertModel",
+            "IBertPreTrainedModel",
+            "load_tf_weights_in_ibert",
+        ]
+    )
    _import_structure["models.layoutlm"].extend(
        [
            "LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST",
@@ -1328,6 +1343,7 @@ if TYPE_CHECKING:
    from .models.funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig, FunnelTokenizer
    from .models.gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config, GPT2Tokenizer
    from .models.herbert import HerbertTokenizer
+    from .models.ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
    from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer
    from .models.led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig, LEDTokenizer
    from .models.longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig, LongformerTokenizer
@@ -1710,6 +1726,15 @@ if TYPE_CHECKING:
            GPT2PreTrainedModel,
            load_tf_weights_in_gpt2,
        )
+        from .models.ibert import (
+            IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
+            IBertForMaskedLM,
+            IBertForMultipleChoice,
+            IBertForQuestionAnswering,
+            IBertForSequenceClassification,
+            IBertForTokenClassification,
+            IBertModel,
+        )
        from .models.layoutlm import (
            LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
            LayoutLMForMaskedLM,

--- a/src/transformers/models/auto/configuration_auto.py
+++ b/src/transformers/models/auto/configuration_auto.py
@@ -40,6 +40,7 @@ from ..flaubert.configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE
 from ..fsmt.configuration_fsmt import FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP, FSMTConfig
 from ..funnel.configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig
 from ..gpt2.configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
+from ..ibert.configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
 from ..layoutlm.configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
 from ..led.configuration_led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig
 from ..longformer.configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
@@ -110,6 +111,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
        PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
        MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
        TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP,
+        IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
    ]
    for key, value, in pretrained_map.items()
 )
@@ -123,6 +125,7 @@ CONFIG_MAPPING = OrderedDict(
        ("led", LEDConfig),
        ("blenderbot-small", BlenderbotSmallConfig),
        ("retribert", RetriBertConfig),
+        ("ibert", IBertConfig),
        ("mt5", MT5Config),
        ("t5", T5Config),
        ("mobilebert", MobileBertConfig),
@@ -173,6 +176,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
        ("led", "LED"),
        ("blenderbot-small", "BlenderbotSmall"),
        ("retribert", "RetriBERT"),
+        ("ibert", "I-BERT"),
        ("t5", "T5"),
        ("mobilebert", "MobileBERT"),
        ("distilbert", "DistilBERT"),

--- a/src/transformers/models/auto/modeling_auto.py
+++ b/src/transformers/models/auto/modeling_auto.py
@@ -129,6 +129,14 @@ from ..funnel.modeling_funnel import (
    FunnelModel,
 )
 from ..gpt2.modeling_gpt2 import GPT2ForSequenceClassification, GPT2LMHeadModel, GPT2Model
+from ..ibert.modeling_ibert import (
+    IBertForMaskedLM,
+    IBertForMultipleChoice,
+    IBertForQuestionAnswering,
+    IBertForSequenceClassification,
+    IBertForTokenClassification,
+    IBertModel,
+)
 from ..layoutlm.modeling_layoutlm import (
    LayoutLMForMaskedLM,
    LayoutLMForSequenceClassification,
@@ -270,6 +278,7 @@ from .configuration_auto import (
    FSMTConfig,
    FunnelConfig,
    GPT2Config,
+    IBertConfig,
    LayoutLMConfig,
    LEDConfig,
    LongformerConfig,
@@ -347,6 +356,7 @@ MODEL_MAPPING = OrderedDict(
        (MPNetConfig, MPNetModel),
        (TapasConfig, TapasModel),
        (MarianConfig, MarianModel),
+        (IBertConfig, IBertModel),
    ]
 )
@@ -379,6 +389,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict(
        (FunnelConfig, FunnelForPreTraining),
        (MPNetConfig, MPNetForMaskedLM),
        (TapasConfig, TapasForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
    ]
 )
@@ -418,6 +429,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
        (TapasConfig, TapasForMaskedLM),
        (DebertaConfig, DebertaForMaskedLM),
        (DebertaV2Config, DebertaV2ForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
    ]
 )
@@ -476,6 +488,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
        (TapasConfig, TapasForMaskedLM),
        (DebertaConfig, DebertaForMaskedLM),
        (DebertaV2Config, DebertaV2ForMaskedLM),
+        (IBertConfig, IBertForMaskedLM),
    ]
 )
@@ -529,6 +542,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
        (TransfoXLConfig, TransfoXLForSequenceClassification),
        (MPNetConfig, MPNetForSequenceClassification),
        (TapasConfig, TapasForSequenceClassification),
+        (IBertConfig, IBertForSequenceClassification),
    ]
 )
@@ -558,6 +572,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
        (MPNetConfig, MPNetForQuestionAnswering),
        (DebertaConfig, DebertaForQuestionAnswering),
        (DebertaV2Config, DebertaV2ForQuestionAnswering),
+        (IBertConfig, IBertForQuestionAnswering),
    ]
 )
@@ -591,6 +606,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
        (MPNetConfig, MPNetForTokenClassification),
        (DebertaConfig, DebertaForTokenClassification),
        (DebertaV2Config, DebertaV2ForTokenClassification),
+        (IBertConfig, IBertForTokenClassification),
    ]
 )
@@ -613,6 +629,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
        (FlaubertConfig, FlaubertForMultipleChoice),
        (FunnelConfig, FunnelForMultipleChoice),
        (MPNetConfig, MPNetForMultipleChoice),
+        (IBertConfig, IBertForMultipleChoice),
    ]
 )

--- a/src/transformers/models/auto/tokenization_auto.py
+++ b/src/transformers/models/auto/tokenization_auto.py
@@ -75,6 +75,7 @@ from .configuration_auto import (
    FSMTConfig,
    FunnelConfig,
    GPT2Config,
+    IBertConfig,
    LayoutLMConfig,
    LEDConfig,
    LongformerConfig,
@@ -244,6 +245,7 @@ TOKENIZER_MAPPING = OrderedDict(
        (TapasConfig, (TapasTokenizer, None)),
        (LEDConfig, (LEDTokenizer, LEDTokenizerFast)),
        (ConvBertConfig, (ConvBertTokenizer, ConvBertTokenizerFast)),
+        (IBertConfig, (RobertaTokenizer, RobertaTokenizerFast)),
        (Wav2Vec2Config, (Wav2Vec2CTCTokenizer, None)),
    ]
 )

--- a/src/transformers/models/ibert/__init__.py
+++ b/src/transformers/models/ibert/__init__.py
+# flake8: noqa
+# There's no way to ignore "F401 '...' imported but unused" warnings in this
+# module, but to preserve other warnings. So, don't check this module at all.
+# Copyright 2020 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+from ...file_utils import _BaseLazyModule, is_tokenizers_available, is_torch_available
+_import_structure = {
+    "configuration_ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
+}
+if is_torch_available():
+    _import_structure["modeling_ibert"] = [
+        "IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
+        "IBertForMaskedLM",
+        "IBertForMultipleChoice",
+        "IBertForQuestionAnswering",
+        "IBertForSequenceClassification",
+        "IBertForTokenClassification",
+        "IBertModel",
+    ]
+if TYPE_CHECKING:
+    from .configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
+    if is_torch_available():
+        from .modeling_ibert import (
+            IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
+            IBertForMaskedLM,
+            IBertForMultipleChoice,
+            IBertForQuestionAnswering,
+            IBertForSequenceClassification,
+            IBertForTokenClassification,
+            IBertModel,
+        )
+else:
+    import importlib
+    import os
+    import sys
+    class _LazyModule(_BaseLazyModule):
+        """
+        Module class that surfaces all objects but only performs associated imports when the objects are requested.
+        """
+        __file__ = globals()["__file__"]
+        __path__ = [os.path.dirname(__file__)]
+        def _get_module(self, module_name: str):
+            return importlib.import_module("." + module_name, self.__name__)
+    sys.modules[__name__] = _LazyModule(__name__, _import_structure)
--- a/src/transformers/models/ibert/configuration_ibert.py
+++ b/src/transformers/models/ibert/configuration_ibert.py
+# coding=utf-8
+# Copyright 2021 The I-BERT Authors (Sehoon Kim, Amir Gholami, Zhewei Yao,
+# Michael Mahoney, Kurt Keutzer - UC Berkeley) and The HuggingFace Inc. team.
+# Copyright (c) 20121, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" I-BERT configuration """
+from ...configuration_utils import PretrainedConfig
+from ...utils import logging
+logger = logging.get_logger(__name__)
+IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
+    "kssteven/ibert-roberta-base": "https://huggingface.co/kssteven/ibert-roberta-base/resolve/main/config.json",
+    "kssteven/ibert-roberta-large": "https://huggingface.co/kssteven/ibert-roberta-large/resolve/main/config.json",
+    "kssteven/ibert-roberta-large-mnli": "https://huggingface.co/kssteven/ibert-roberta-large-mnli/resolve/main/config.json",
+}
+class IBertConfig(PretrainedConfig):
+    """
+    This is the configuration class to store the configuration of a :class:`~transformers.IBertModel`. It is used to
+    instantiate a I-BERT model according to the specified arguments,
+    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
+    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+    Args:
+        vocab_size (:obj:`int`, `optional`, defaults to 30522):
+            Vocabulary size of the I-BERT model. Defines the number of different tokens that can be represented by the
+            :obj:`inputs_ids` passed when calling :class:`~transformers.IBertModel`
+        hidden_size (:obj:`int`, `optional`, defaults to 768):
+            Dimensionality of the encoder layers and the pooler layer.
+        num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
+            Number of hidden layers in the Transformer encoder.
+        num_attention_heads (:obj:`int`, `optional`, defaults to 12):
+            Number of attention heads for each attention layer in the Transformer encoder.
+        intermediate_size (:obj:`int`, `optional`, defaults to 3072):
+            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
+        hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
+            The non-linear activation function (function or string) in the encoder and pooler. If string,
+            :obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
+        hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
+        attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
+            The dropout ratio for the attention probabilities.
+        max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        type_vocab_size (:obj:`int`, `optional`, defaults to 2):
+            The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.IBertModel`
+        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
+            The epsilon used by the layer normalization layers.
+        position_embedding_type (:obj:`str`, `optional`, defaults to :obj:`"absolute"`):
+            Type of position embedding. Choose one of :obj:`"absolute"`, :obj:`"relative_key"`,
+            :obj:`"relative_key_query"`. For positional embeddings use :obj:`"absolute"`. For more information on
+            :obj:`"relative_key"`, please refer to `Self-Attention with Relative Position Representations (Shaw et al.)
+            <https://arxiv.org/abs/1803.02155>`__. For more information on :obj:`"relative_key_query"`, please refer to
+            `Method 4` in `Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)
+            <https://arxiv.org/abs/2009.13658>`__.
+        quant_mode (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            Whether to quantize the model or not.
+        force_dequant (:obj:`str`, `optional`, defaults to :obj:`"none"`):
+            Force dequantize specific nonlinear layer. Dequatized layers are then executed with full precision.
+            :obj:`"none"`, :obj:`"gelu"`, :obj:`"softmax"`, :obj:`"layernorm"` and :obj:`"nonlinear"` are supported. As
+            deafult, it is set as :obj:`"none"`, which does not dequantize any layers. Please specify :obj:`"gelu"`,
+            :obj:`"softmax"`, or :obj:`"layernorm"` to dequantize GELU, Softmax, or LayerNorm, respectively.
+            :obj:`"nonlinear"` will dequantize all nonlinear layers, i.e., GELU, Softmax, and LayerNorm.
+    """
+    model_type = "ibert"
+    def __init__(
+        self,
+        vocab_size=30522,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        hidden_act="gelu",
+        hidden_dropout_prob=0.1,
+        attention_probs_dropout_prob=0.1,
+        max_position_embeddings=512,
+        type_vocab_size=2,
+        initializer_range=0.02,
+        layer_norm_eps=1e-12,
+        pad_token_id=1,
+        bos_token_id=0,
+        eos_token_id=2,
+        position_embedding_type="absolute",
+        quant_mode=False,
+        force_dequant="none",
+        **kwargs
+    ):
+        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
+        self.vocab_size = vocab_size
+        self.hidden_size = hidden_size
+        self.num_hidden_layers = num_hidden_layers
+        self.num_attention_heads = num_attention_heads
+        self.hidden_act = hidden_act
+        self.intermediate_size = intermediate_size
+        self.hidden_dropout_prob = hidden_dropout_prob
+        self.attention_probs_dropout_prob = attention_probs_dropout_prob
+        self.max_position_embeddings = max_position_embeddings
+        self.type_vocab_size = type_vocab_size
+        self.initializer_range = initializer_range
+        self.layer_norm_eps = layer_norm_eps
+        self.position_embedding_type = position_embedding_type
+        self.quant_mode = quant_mode
+        self.force_dequant = force_dequant
--- a/src/transformers/models/ibert/modeling_ibert.py
+++ b/src/transformers/models/ibert/modeling_ibert.py
--- a/src/transformers/models/ibert/quant_modules.py
+++ b/src/transformers/models/ibert/quant_modules.py
--- a/src/transformers/utils/dummy_pt_objects.py
+++ b/src/transformers/utils/dummy_pt_objects.py
@@ -1349,6 +1349,63 @@ def load_tf_weights_in_gpt2(*args, **kwargs):
    requires_pytorch(load_tf_weights_in_gpt2)
+IBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
+class IBertForMaskedLM:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class IBertForMultipleChoice:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class IBertForQuestionAnswering:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class IBertForSequenceClassification:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class IBertForTokenClassification:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
+class IBertModel:
+    def __init__(self, *args, **kwargs):
+        requires_pytorch(self)
+    @classmethod
+    def from_pretrained(self, *args, **kwargs):
+        requires_pytorch(self)
 LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST = None

--- a/tests/test_modeling_ibert.py
+++ b/tests/test_modeling_ibert.py