"...git@developer.sourcefind.cn:chenpangpang/open-webui.git" did not exist on "8b6f422d4596b360402ae389f81d1f9c6e18bc77"
Unverified Commit 63645b3b authored by Sehoon Kim's avatar Sehoon Kim Committed by GitHub
Browse files

I-BERT model support (#10153)



* IBertConfig, IBertTokentizer added

* IBert Model names moified

* tokenizer bugfix

* embedding -> QuantEmbedding

* quant utils added

* quant_mode added to configuration

* QuantAct added, Embedding layer + QuantAct addition

* QuantAct added

* unused path removed, QKV quantized

* self attention layer all quantized, except softmax

* temporarl commit

* all liner layers quantized

* quant_utils bugfix

* bugfix: requantization missing

* IntGELU added

* IntSoftmax added

* LayerNorm implemented

* LayerNorm implemented all

* names changed: roberta->ibert

* config not inherit from ROberta

* No support for CausalLM

* static quantization added, quantize_model.py removed

* import modules uncommented

* copyrights fixed

* minor bugfix

* quant_modules, quant_utils merged as one file

* import * fixed

* unused runfile removed

* make style run

* configutration.py docstring fixed

* refactoring: comments removed, function name fixed

* unused dependency removed

* typo fixed

* comments(Copied from), assertion string added

* refactoring: super(..) -> super(), etc.

* refactoring

* refarctoring

* make style

* refactoring

* cuda -> to(x.device)

* weight initialization removed

* QuantLinear set_param removed

* QuantEmbedding set_param removed

* IntLayerNorm set_param removed

* assert string added

* assertion error message fixed

* is_decoder removed

* enc-dec arguments/functions removed

* Converter removed

* quant_modules docstring fixed

* conver_slow_tokenizer rolled back

* quant_utils docstring fixed

* unused aruments e.g. use_cache removed from config

* weight initialization condition fixed

* x_min, x_max initialized with small values to avoid div-zero exceptions

* testing code for ibert

* test emb, linear, gelu, softmax added

* test ln and act added

* style reformatted

* force_dequant added

* error tests overrided

* make style

* Style + Docs

* force dequant tests added

* Fix fast tokenizer in init

* Fix doc

* Remove space

* docstring, IBertConfig, chunk_size

* test_modeling_ibert refactoring

* quant_modules.py refactoring

* e2e integration test added

* tokenizers removed

* IBertConfig added to tokenizer_auto.py

* bugfix

* fix docs & test

* fix style num 2

* final fixes
Co-authored-by: default avatarSehoon Kim <sehoonkim@berkeley.edu>
Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: default avatarSylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
parent cb38ffcc
...@@ -263,6 +263,8 @@ TensorFlow and/or Flax. ...@@ -263,6 +263,8 @@ TensorFlow and/or Flax.
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | | Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | | LED | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+ +-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | | LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
...@@ -405,6 +407,7 @@ TensorFlow and/or Flax. ...@@ -405,6 +407,7 @@ TensorFlow and/or Flax.
model_doc/fsmt model_doc/fsmt
model_doc/funnel model_doc/funnel
model_doc/herbert model_doc/herbert
model_doc/ibert
model_doc/layoutlm model_doc/layoutlm
model_doc/led model_doc/led
model_doc/longformer model_doc/longformer
......
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
I-BERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The I-BERT model was proposed in `I-BERT: Integer-only BERT Quantization <https://arxiv.org/abs/2006.10220>`__ by
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer. It's a quantized version of RoBERTa running
inference up to four times faster.
The abstract from the paper is the following:
*Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language
Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for
efficient inference at the edge, and even at the data center. While quantization can be a viable solution for this,
previous work on quantizing Transformer based models use floating-point arithmetic during inference, which cannot
efficiently utilize integer-only logical units such as the recent Turing Tensor Cores, or traditional integer-only ARM
processors. In this work, we propose I-BERT, a novel quantization scheme for Transformer based models that quantizes
the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for
nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an end-to-end integer-only BERT
inference without any floating point calculation. We evaluate our approach on GLUE downstream tasks using
RoBERTa-Base/Large. We show that for both cases, I-BERT achieves similar (and slightly higher) accuracy as compared to
the full-precision baseline. Furthermore, our preliminary implementation of I-BERT shows a speedup of 2.4 - 4.0x for
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
been open-sourced.*
The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
IBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertConfig
:members:
IBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertModel
:members: forward
IBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForMaskedLM
:members: forward
IBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForSequenceClassification
:members: forward
IBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForMultipleChoice
:members: forward
IBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForTokenClassification
:members: forward
IBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.IBertForQuestionAnswering
:members: forward
...@@ -182,6 +182,7 @@ _import_structure = { ...@@ -182,6 +182,7 @@ _import_structure = {
"models.funnel": ["FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP", "FunnelConfig", "FunnelTokenizer"], "models.funnel": ["FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP", "FunnelConfig", "FunnelTokenizer"],
"models.gpt2": ["GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPT2Config", "GPT2Tokenizer"], "models.gpt2": ["GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP", "GPT2Config", "GPT2Tokenizer"],
"models.herbert": ["HerbertTokenizer"], "models.herbert": ["HerbertTokenizer"],
"models.ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
"models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"], "models.layoutlm": ["LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP", "LayoutLMConfig", "LayoutLMTokenizer"],
"models.led": ["LED_PRETRAINED_CONFIG_ARCHIVE_MAP", "LEDConfig", "LEDTokenizer"], "models.led": ["LED_PRETRAINED_CONFIG_ARCHIVE_MAP", "LEDConfig", "LEDTokenizer"],
"models.longformer": ["LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "LongformerConfig", "LongformerTokenizer"], "models.longformer": ["LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "LongformerConfig", "LongformerTokenizer"],
...@@ -613,6 +614,20 @@ if is_torch_available(): ...@@ -613,6 +614,20 @@ if is_torch_available():
"load_tf_weights_in_gpt2", "load_tf_weights_in_gpt2",
] ]
) )
_import_structure["models.ibert"].extend(
[
"IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
"IBertForMaskedLM",
"IBertForMultipleChoice",
"IBertForQuestionAnswering",
"IBertForSequenceClassification",
"IBertForTokenClassification",
"IBertLayer",
"IBertModel",
"IBertPreTrainedModel",
"load_tf_weights_in_ibert",
]
)
_import_structure["models.layoutlm"].extend( _import_structure["models.layoutlm"].extend(
[ [
"LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST", "LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST",
...@@ -1328,6 +1343,7 @@ if TYPE_CHECKING: ...@@ -1328,6 +1343,7 @@ if TYPE_CHECKING:
from .models.funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig, FunnelTokenizer from .models.funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig, FunnelTokenizer
from .models.gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config, GPT2Tokenizer from .models.gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config, GPT2Tokenizer
from .models.herbert import HerbertTokenizer from .models.herbert import HerbertTokenizer
from .models.ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer from .models.layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig, LayoutLMTokenizer
from .models.led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig, LEDTokenizer from .models.led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig, LEDTokenizer
from .models.longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig, LongformerTokenizer from .models.longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig, LongformerTokenizer
...@@ -1710,6 +1726,15 @@ if TYPE_CHECKING: ...@@ -1710,6 +1726,15 @@ if TYPE_CHECKING:
GPT2PreTrainedModel, GPT2PreTrainedModel,
load_tf_weights_in_gpt2, load_tf_weights_in_gpt2,
) )
from .models.ibert import (
IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
IBertForMaskedLM,
IBertForMultipleChoice,
IBertForQuestionAnswering,
IBertForSequenceClassification,
IBertForTokenClassification,
IBertModel,
)
from .models.layoutlm import ( from .models.layoutlm import (
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST, LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST,
LayoutLMForMaskedLM, LayoutLMForMaskedLM,
......
...@@ -40,6 +40,7 @@ from ..flaubert.configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE ...@@ -40,6 +40,7 @@ from ..flaubert.configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE
from ..fsmt.configuration_fsmt import FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP, FSMTConfig from ..fsmt.configuration_fsmt import FSMT_PRETRAINED_CONFIG_ARCHIVE_MAP, FSMTConfig
from ..funnel.configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig from ..funnel.configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig
from ..gpt2.configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config from ..gpt2.configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
from ..ibert.configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
from ..layoutlm.configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig from ..layoutlm.configuration_layoutlm import LAYOUTLM_PRETRAINED_CONFIG_ARCHIVE_MAP, LayoutLMConfig
from ..led.configuration_led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig from ..led.configuration_led import LED_PRETRAINED_CONFIG_ARCHIVE_MAP, LEDConfig
from ..longformer.configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig from ..longformer.configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
...@@ -110,6 +111,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict( ...@@ -110,6 +111,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP, PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP, MPNET_PRETRAINED_CONFIG_ARCHIVE_MAP,
TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP, TAPAS_PRETRAINED_CONFIG_ARCHIVE_MAP,
IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
] ]
for key, value, in pretrained_map.items() for key, value, in pretrained_map.items()
) )
...@@ -123,6 +125,7 @@ CONFIG_MAPPING = OrderedDict( ...@@ -123,6 +125,7 @@ CONFIG_MAPPING = OrderedDict(
("led", LEDConfig), ("led", LEDConfig),
("blenderbot-small", BlenderbotSmallConfig), ("blenderbot-small", BlenderbotSmallConfig),
("retribert", RetriBertConfig), ("retribert", RetriBertConfig),
("ibert", IBertConfig),
("mt5", MT5Config), ("mt5", MT5Config),
("t5", T5Config), ("t5", T5Config),
("mobilebert", MobileBertConfig), ("mobilebert", MobileBertConfig),
...@@ -173,6 +176,7 @@ MODEL_NAMES_MAPPING = OrderedDict( ...@@ -173,6 +176,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("led", "LED"), ("led", "LED"),
("blenderbot-small", "BlenderbotSmall"), ("blenderbot-small", "BlenderbotSmall"),
("retribert", "RetriBERT"), ("retribert", "RetriBERT"),
("ibert", "I-BERT"),
("t5", "T5"), ("t5", "T5"),
("mobilebert", "MobileBERT"), ("mobilebert", "MobileBERT"),
("distilbert", "DistilBERT"), ("distilbert", "DistilBERT"),
......
...@@ -129,6 +129,14 @@ from ..funnel.modeling_funnel import ( ...@@ -129,6 +129,14 @@ from ..funnel.modeling_funnel import (
FunnelModel, FunnelModel,
) )
from ..gpt2.modeling_gpt2 import GPT2ForSequenceClassification, GPT2LMHeadModel, GPT2Model from ..gpt2.modeling_gpt2 import GPT2ForSequenceClassification, GPT2LMHeadModel, GPT2Model
from ..ibert.modeling_ibert import (
IBertForMaskedLM,
IBertForMultipleChoice,
IBertForQuestionAnswering,
IBertForSequenceClassification,
IBertForTokenClassification,
IBertModel,
)
from ..layoutlm.modeling_layoutlm import ( from ..layoutlm.modeling_layoutlm import (
LayoutLMForMaskedLM, LayoutLMForMaskedLM,
LayoutLMForSequenceClassification, LayoutLMForSequenceClassification,
...@@ -270,6 +278,7 @@ from .configuration_auto import ( ...@@ -270,6 +278,7 @@ from .configuration_auto import (
FSMTConfig, FSMTConfig,
FunnelConfig, FunnelConfig,
GPT2Config, GPT2Config,
IBertConfig,
LayoutLMConfig, LayoutLMConfig,
LEDConfig, LEDConfig,
LongformerConfig, LongformerConfig,
...@@ -347,6 +356,7 @@ MODEL_MAPPING = OrderedDict( ...@@ -347,6 +356,7 @@ MODEL_MAPPING = OrderedDict(
(MPNetConfig, MPNetModel), (MPNetConfig, MPNetModel),
(TapasConfig, TapasModel), (TapasConfig, TapasModel),
(MarianConfig, MarianModel), (MarianConfig, MarianModel),
(IBertConfig, IBertModel),
] ]
) )
...@@ -379,6 +389,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict( ...@@ -379,6 +389,7 @@ MODEL_FOR_PRETRAINING_MAPPING = OrderedDict(
(FunnelConfig, FunnelForPreTraining), (FunnelConfig, FunnelForPreTraining),
(MPNetConfig, MPNetForMaskedLM), (MPNetConfig, MPNetForMaskedLM),
(TapasConfig, TapasForMaskedLM), (TapasConfig, TapasForMaskedLM),
(IBertConfig, IBertForMaskedLM),
] ]
) )
...@@ -418,6 +429,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict( ...@@ -418,6 +429,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
(TapasConfig, TapasForMaskedLM), (TapasConfig, TapasForMaskedLM),
(DebertaConfig, DebertaForMaskedLM), (DebertaConfig, DebertaForMaskedLM),
(DebertaV2Config, DebertaV2ForMaskedLM), (DebertaV2Config, DebertaV2ForMaskedLM),
(IBertConfig, IBertForMaskedLM),
] ]
) )
...@@ -476,6 +488,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict( ...@@ -476,6 +488,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
(TapasConfig, TapasForMaskedLM), (TapasConfig, TapasForMaskedLM),
(DebertaConfig, DebertaForMaskedLM), (DebertaConfig, DebertaForMaskedLM),
(DebertaV2Config, DebertaV2ForMaskedLM), (DebertaV2Config, DebertaV2ForMaskedLM),
(IBertConfig, IBertForMaskedLM),
] ]
) )
...@@ -529,6 +542,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict( ...@@ -529,6 +542,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
(TransfoXLConfig, TransfoXLForSequenceClassification), (TransfoXLConfig, TransfoXLForSequenceClassification),
(MPNetConfig, MPNetForSequenceClassification), (MPNetConfig, MPNetForSequenceClassification),
(TapasConfig, TapasForSequenceClassification), (TapasConfig, TapasForSequenceClassification),
(IBertConfig, IBertForSequenceClassification),
] ]
) )
...@@ -558,6 +572,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict( ...@@ -558,6 +572,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
(MPNetConfig, MPNetForQuestionAnswering), (MPNetConfig, MPNetForQuestionAnswering),
(DebertaConfig, DebertaForQuestionAnswering), (DebertaConfig, DebertaForQuestionAnswering),
(DebertaV2Config, DebertaV2ForQuestionAnswering), (DebertaV2Config, DebertaV2ForQuestionAnswering),
(IBertConfig, IBertForQuestionAnswering),
] ]
) )
...@@ -591,6 +606,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict( ...@@ -591,6 +606,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
(MPNetConfig, MPNetForTokenClassification), (MPNetConfig, MPNetForTokenClassification),
(DebertaConfig, DebertaForTokenClassification), (DebertaConfig, DebertaForTokenClassification),
(DebertaV2Config, DebertaV2ForTokenClassification), (DebertaV2Config, DebertaV2ForTokenClassification),
(IBertConfig, IBertForTokenClassification),
] ]
) )
...@@ -613,6 +629,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict( ...@@ -613,6 +629,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
(FlaubertConfig, FlaubertForMultipleChoice), (FlaubertConfig, FlaubertForMultipleChoice),
(FunnelConfig, FunnelForMultipleChoice), (FunnelConfig, FunnelForMultipleChoice),
(MPNetConfig, MPNetForMultipleChoice), (MPNetConfig, MPNetForMultipleChoice),
(IBertConfig, IBertForMultipleChoice),
] ]
) )
......
...@@ -75,6 +75,7 @@ from .configuration_auto import ( ...@@ -75,6 +75,7 @@ from .configuration_auto import (
FSMTConfig, FSMTConfig,
FunnelConfig, FunnelConfig,
GPT2Config, GPT2Config,
IBertConfig,
LayoutLMConfig, LayoutLMConfig,
LEDConfig, LEDConfig,
LongformerConfig, LongformerConfig,
...@@ -244,6 +245,7 @@ TOKENIZER_MAPPING = OrderedDict( ...@@ -244,6 +245,7 @@ TOKENIZER_MAPPING = OrderedDict(
(TapasConfig, (TapasTokenizer, None)), (TapasConfig, (TapasTokenizer, None)),
(LEDConfig, (LEDTokenizer, LEDTokenizerFast)), (LEDConfig, (LEDTokenizer, LEDTokenizerFast)),
(ConvBertConfig, (ConvBertTokenizer, ConvBertTokenizerFast)), (ConvBertConfig, (ConvBertTokenizer, ConvBertTokenizerFast)),
(IBertConfig, (RobertaTokenizer, RobertaTokenizerFast)),
(Wav2Vec2Config, (Wav2Vec2CTCTokenizer, None)), (Wav2Vec2Config, (Wav2Vec2CTCTokenizer, None)),
] ]
) )
......
# flake8: noqa
# There's no way to ignore "F401 '...' imported but unused" warnings in this
# module, but to preserve other warnings. So, don't check this module at all.
# Copyright 2020 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
from ...file_utils import _BaseLazyModule, is_tokenizers_available, is_torch_available
_import_structure = {
"configuration_ibert": ["IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "IBertConfig"],
}
if is_torch_available():
_import_structure["modeling_ibert"] = [
"IBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
"IBertForMaskedLM",
"IBertForMultipleChoice",
"IBertForQuestionAnswering",
"IBertForSequenceClassification",
"IBertForTokenClassification",
"IBertModel",
]
if TYPE_CHECKING:
from .configuration_ibert import IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, IBertConfig
if is_torch_available():
from .modeling_ibert import (
IBERT_PRETRAINED_MODEL_ARCHIVE_LIST,
IBertForMaskedLM,
IBertForMultipleChoice,
IBertForQuestionAnswering,
IBertForSequenceClassification,
IBertForTokenClassification,
IBertModel,
)
else:
import importlib
import os
import sys
class _LazyModule(_BaseLazyModule):
"""
Module class that surfaces all objects but only performs associated imports when the objects are requested.
"""
__file__ = globals()["__file__"]
__path__ = [os.path.dirname(__file__)]
def _get_module(self, module_name: str):
return importlib.import_module("." + module_name, self.__name__)
sys.modules[__name__] = _LazyModule(__name__, _import_structure)
# coding=utf-8
# Copyright 2021 The I-BERT Authors (Sehoon Kim, Amir Gholami, Zhewei Yao,
# Michael Mahoney, Kurt Keutzer - UC Berkeley) and The HuggingFace Inc. team.
# Copyright (c) 20121, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" I-BERT configuration """
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
IBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"kssteven/ibert-roberta-base": "https://huggingface.co/kssteven/ibert-roberta-base/resolve/main/config.json",
"kssteven/ibert-roberta-large": "https://huggingface.co/kssteven/ibert-roberta-large/resolve/main/config.json",
"kssteven/ibert-roberta-large-mnli": "https://huggingface.co/kssteven/ibert-roberta-large-mnli/resolve/main/config.json",
}
class IBertConfig(PretrainedConfig):
"""
This is the configuration class to store the configuration of a :class:`~transformers.IBertModel`. It is used to
instantiate a I-BERT model according to the specified arguments,
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
Vocabulary size of the I-BERT model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.IBertModel`
hidden_size (:obj:`int`, `optional`, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.IBertModel`
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
The epsilon used by the layer normalization layers.
position_embedding_type (:obj:`str`, `optional`, defaults to :obj:`"absolute"`):
Type of position embedding. Choose one of :obj:`"absolute"`, :obj:`"relative_key"`,
:obj:`"relative_key_query"`. For positional embeddings use :obj:`"absolute"`. For more information on
:obj:`"relative_key"`, please refer to `Self-Attention with Relative Position Representations (Shaw et al.)
<https://arxiv.org/abs/1803.02155>`__. For more information on :obj:`"relative_key_query"`, please refer to
`Method 4` in `Improve Transformer Models with Better Relative Position Embeddings (Huang et al.)
<https://arxiv.org/abs/2009.13658>`__.
quant_mode (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to quantize the model or not.
force_dequant (:obj:`str`, `optional`, defaults to :obj:`"none"`):
Force dequantize specific nonlinear layer. Dequatized layers are then executed with full precision.
:obj:`"none"`, :obj:`"gelu"`, :obj:`"softmax"`, :obj:`"layernorm"` and :obj:`"nonlinear"` are supported. As
deafult, it is set as :obj:`"none"`, which does not dequantize any layers. Please specify :obj:`"gelu"`,
:obj:`"softmax"`, or :obj:`"layernorm"` to dequantize GELU, Softmax, or LayerNorm, respectively.
:obj:`"nonlinear"` will dequantize all nonlinear layers, i.e., GELU, Softmax, and LayerNorm.
"""
model_type = "ibert"
def __init__(
self,
vocab_size=30522,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
max_position_embeddings=512,
type_vocab_size=2,
initializer_range=0.02,
layer_norm_eps=1e-12,
pad_token_id=1,
bos_token_id=0,
eos_token_id=2,
position_embedding_type="absolute",
quant_mode=False,
force_dequant="none",
**kwargs
):
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
self.vocab_size = vocab_size
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.intermediate_size = intermediate_size
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.max_position_embeddings = max_position_embeddings
self.type_vocab_size = type_vocab_size
self.initializer_range = initializer_range
self.layer_norm_eps = layer_norm_eps
self.position_embedding_type = position_embedding_type
self.quant_mode = quant_mode
self.force_dequant = force_dequant
This diff is collapsed.
This diff is collapsed.
...@@ -1349,6 +1349,63 @@ def load_tf_weights_in_gpt2(*args, **kwargs): ...@@ -1349,6 +1349,63 @@ def load_tf_weights_in_gpt2(*args, **kwargs):
requires_pytorch(load_tf_weights_in_gpt2) requires_pytorch(load_tf_weights_in_gpt2)
IBERT_PRETRAINED_MODEL_ARCHIVE_LIST = None
class IBertForMaskedLM:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class IBertForMultipleChoice:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class IBertForQuestionAnswering:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class IBertForSequenceClassification:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class IBertForTokenClassification:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
class IBertModel:
def __init__(self, *args, **kwargs):
requires_pytorch(self)
@classmethod
def from_pretrained(self, *args, **kwargs):
requires_pytorch(self)
LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST = None LAYOUTLM_PRETRAINED_MODEL_ARCHIVE_LIST = None
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment