Unverified Commit de8548eb authored by Christoffer Koo Øhrstrøm's avatar Christoffer Koo Øhrstrøm Committed by GitHub
Browse files

[LayoutLMv3] Add TensorFlow implementation (#18678)


Co-authored-by: default avatarEsben Toke Christensen <esben.christensen@visma.com>
Co-authored-by: default avatarLasse Reedtz <lasse.reedtz@visma.com>
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
parent 7320d95d
......@@ -38,7 +38,7 @@ The documentation is organized in five parts:
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes each class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
......@@ -245,7 +245,7 @@ Flax), PyTorch, and/or TensorFlow.
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
......
......@@ -26,18 +26,18 @@ Tips:
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png"
alt="drawing" width="600"/>
alt="drawing" width="600"/>
<small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [chriskoo](https://huggingface.co/chriskoo), [tokec](https://huggingface.co/tokec), and [lre](https://huggingface.co/lre). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
## LayoutLMv3Config
......@@ -84,3 +84,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
[[autodoc]] LayoutLMv3ForQuestionAnswering
- forward
## TFLayoutLMv3Model
[[autodoc]] TFLayoutLMv3Model
- call
## TFLayoutLMv3ForSequenceClassification
[[autodoc]] TFLayoutLMv3ForSequenceClassification
- call
## TFLayoutLMv3ForTokenClassification
[[autodoc]] TFLayoutLMv3ForTokenClassification
- call
## TFLayoutLMv3ForQuestionAnswering
[[autodoc]] TFLayoutLMv3ForQuestionAnswering
- call
......@@ -221,7 +221,7 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
......@@ -288,4 +288,4 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
<!-- End table-->
\ No newline at end of file
<!-- End table-->
......@@ -2343,6 +2343,16 @@ else:
"TFLayoutLMPreTrainedModel",
]
)
_import_structure["models.layoutlmv3"].extend(
[
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]
)
_import_structure["models.led"].extend(["TFLEDForConditionalGeneration", "TFLEDModel", "TFLEDPreTrainedModel"])
_import_structure["models.longformer"].extend(
[
......@@ -4801,6 +4811,14 @@ if TYPE_CHECKING:
TFHubertModel,
TFHubertPreTrainedModel,
)
from .models.layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)
from .models.led import TFLEDForConditionalGeneration, TFLEDModel, TFLEDPreTrainedModel
from .models.longformer import (
TF_LONGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
......
......@@ -52,6 +52,7 @@ TF_MODEL_MAPPING_NAMES = OrderedDict(
("gptj", "TFGPTJModel"),
("hubert", "TFHubertModel"),
("layoutlm", "TFLayoutLMModel"),
("layoutlmv3", "TFLayoutLMv3Model"),
("led", "TFLEDModel"),
("longformer", "TFLongformerModel"),
("lxmert", "TFLxmertModel"),
......@@ -268,6 +269,7 @@ TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("gpt2", "TFGPT2ForSequenceClassification"),
("gptj", "TFGPTJForSequenceClassification"),
("layoutlm", "TFLayoutLMForSequenceClassification"),
("layoutlmv3", "TFLayoutLMv3ForSequenceClassification"),
("longformer", "TFLongformerForSequenceClassification"),
("mobilebert", "TFMobileBertForSequenceClassification"),
("mpnet", "TFMPNetForSequenceClassification"),
......@@ -297,6 +299,7 @@ TF_MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
("flaubert", "TFFlaubertForQuestionAnsweringSimple"),
("funnel", "TFFunnelForQuestionAnswering"),
("gptj", "TFGPTJForQuestionAnswering"),
("layoutlmv3", "TFLayoutLMv3ForQuestionAnswering"),
("longformer", "TFLongformerForQuestionAnswering"),
("mobilebert", "TFMobileBertForQuestionAnswering"),
("mpnet", "TFMPNetForQuestionAnswering"),
......@@ -316,7 +319,6 @@ TF_MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
]
)
TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Token Classification mapping
......@@ -331,6 +333,7 @@ TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("flaubert", "TFFlaubertForTokenClassification"),
("funnel", "TFFunnelForTokenClassification"),
("layoutlm", "TFLayoutLMForTokenClassification"),
("layoutlmv3", "TFLayoutLMv3ForTokenClassification"),
("longformer", "TFLongformerForTokenClassification"),
("mobilebert", "TFMobileBertForTokenClassification"),
("mpnet", "TFMPNetForTokenClassification"),
......@@ -373,7 +376,6 @@ TF_MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict(
]
)
TF_MODEL_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_MAPPING_NAMES)
TF_MODEL_FOR_PRETRAINING_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_FOR_PRETRAINING_MAPPING_NAMES)
TF_MODEL_WITH_LM_HEAD_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_WITH_LM_HEAD_MAPPING_NAMES)
......
......@@ -21,6 +21,7 @@ from typing import TYPE_CHECKING
from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_tf_available,
is_tokenizers_available,
is_torch_available,
is_vision_available,
......@@ -60,6 +61,21 @@ else:
"LayoutLMv3PreTrainedModel",
]
try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_tf_layoutlmv3"] = [
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]
try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
......@@ -101,6 +117,21 @@ if TYPE_CHECKING:
LayoutLMv3PreTrainedModel,
)
try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_tf_layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)
try:
if not is_vision_available():
raise OptionalDependencyNotAvailable()
......
This diff is collapsed.
......@@ -1316,6 +1316,44 @@ class TFHubertPreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["tf"])
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST = None
class TFLayoutLMv3ForQuestionAnswering(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3ForSequenceClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3ForTokenClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3Model(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3PreTrainedModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLEDForConditionalGeneration(metaclass=DummyObject):
_backends = ["tf"]
......
This diff is collapsed.
......@@ -38,6 +38,7 @@ src/transformers/models/gptj/modeling_gptj.py
src/transformers/models/hubert/modeling_hubert.py
src/transformers/models/layoutlmv2/modeling_layoutlmv2.py
src/transformers/models/layoutlmv3/modeling_layoutlmv3.py
src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py
src/transformers/models/longformer/modeling_longformer.py
src/transformers/models/longformer/modeling_tf_longformer.py
src/transformers/models/longt5/modeling_longt5.py
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment