Unverified Commit de8548eb authored by Christoffer Koo Øhrstrøm's avatar Christoffer Koo Øhrstrøm Committed by GitHub
Browse files

[LayoutLMv3] Add TensorFlow implementation (#18678)


Co-authored-by: default avatarEsben Toke Christensen <esben.christensen@visma.com>
Co-authored-by: default avatarLasse Reedtz <lasse.reedtz@visma.com>
Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
parent 7320d95d
...@@ -38,7 +38,7 @@ The documentation is organized in five parts: ...@@ -38,7 +38,7 @@ The documentation is organized in five parts:
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers. - **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers. - **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head. - **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers. - **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes each class and function, grouped in: - **API** describes each class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library. - **MAIN CLASSES** for the main classes exposing the important APIs of the library.
...@@ -245,7 +245,7 @@ Flax), PyTorch, and/or TensorFlow. ...@@ -245,7 +245,7 @@ Flax), PyTorch, and/or TensorFlow.
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | | ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | | LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | | LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ | | LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | | LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ | | LeViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | | Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
......
...@@ -26,18 +26,18 @@ Tips: ...@@ -26,18 +26,18 @@ Tips:
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that: - In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format. - images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece. - text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model. Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor. - Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3). - Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3). - Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
<small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small> <small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3). This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [chriskoo](https://huggingface.co/chriskoo), [tokec](https://huggingface.co/tokec), and [lre](https://huggingface.co/lre). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
## LayoutLMv3Config ## LayoutLMv3Config
...@@ -84,3 +84,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi ...@@ -84,3 +84,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr). The origi
[[autodoc]] LayoutLMv3ForQuestionAnswering [[autodoc]] LayoutLMv3ForQuestionAnswering
- forward - forward
## TFLayoutLMv3Model
[[autodoc]] TFLayoutLMv3Model
- call
## TFLayoutLMv3ForSequenceClassification
[[autodoc]] TFLayoutLMv3ForSequenceClassification
- call
## TFLayoutLMv3ForTokenClassification
[[autodoc]] TFLayoutLMv3ForTokenClassification
- call
## TFLayoutLMv3ForQuestionAnswering
[[autodoc]] TFLayoutLMv3ForQuestionAnswering
- call
...@@ -221,7 +221,7 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗 ...@@ -221,7 +221,7 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | | ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | | LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | | LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ | | LayoutLMv3 | ✅ | ✅ | ✅ | | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | | LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | | Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ | | LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
...@@ -288,4 +288,4 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗 ...@@ -288,4 +288,4 @@ tokenizer (chiamato "slow"). Un tokenizer "fast" supportato dalla libreria 🤗
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ | | YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ | | YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
<!-- End table--> <!-- End table-->
\ No newline at end of file
...@@ -2343,6 +2343,16 @@ else: ...@@ -2343,6 +2343,16 @@ else:
"TFLayoutLMPreTrainedModel", "TFLayoutLMPreTrainedModel",
] ]
) )
_import_structure["models.layoutlmv3"].extend(
[
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]
)
_import_structure["models.led"].extend(["TFLEDForConditionalGeneration", "TFLEDModel", "TFLEDPreTrainedModel"]) _import_structure["models.led"].extend(["TFLEDForConditionalGeneration", "TFLEDModel", "TFLEDPreTrainedModel"])
_import_structure["models.longformer"].extend( _import_structure["models.longformer"].extend(
[ [
...@@ -4801,6 +4811,14 @@ if TYPE_CHECKING: ...@@ -4801,6 +4811,14 @@ if TYPE_CHECKING:
TFHubertModel, TFHubertModel,
TFHubertPreTrainedModel, TFHubertPreTrainedModel,
) )
from .models.layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)
from .models.led import TFLEDForConditionalGeneration, TFLEDModel, TFLEDPreTrainedModel from .models.led import TFLEDForConditionalGeneration, TFLEDModel, TFLEDPreTrainedModel
from .models.longformer import ( from .models.longformer import (
TF_LONGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST, TF_LONGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
......
...@@ -52,6 +52,7 @@ TF_MODEL_MAPPING_NAMES = OrderedDict( ...@@ -52,6 +52,7 @@ TF_MODEL_MAPPING_NAMES = OrderedDict(
("gptj", "TFGPTJModel"), ("gptj", "TFGPTJModel"),
("hubert", "TFHubertModel"), ("hubert", "TFHubertModel"),
("layoutlm", "TFLayoutLMModel"), ("layoutlm", "TFLayoutLMModel"),
("layoutlmv3", "TFLayoutLMv3Model"),
("led", "TFLEDModel"), ("led", "TFLEDModel"),
("longformer", "TFLongformerModel"), ("longformer", "TFLongformerModel"),
("lxmert", "TFLxmertModel"), ("lxmert", "TFLxmertModel"),
...@@ -268,6 +269,7 @@ TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict( ...@@ -268,6 +269,7 @@ TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("gpt2", "TFGPT2ForSequenceClassification"), ("gpt2", "TFGPT2ForSequenceClassification"),
("gptj", "TFGPTJForSequenceClassification"), ("gptj", "TFGPTJForSequenceClassification"),
("layoutlm", "TFLayoutLMForSequenceClassification"), ("layoutlm", "TFLayoutLMForSequenceClassification"),
("layoutlmv3", "TFLayoutLMv3ForSequenceClassification"),
("longformer", "TFLongformerForSequenceClassification"), ("longformer", "TFLongformerForSequenceClassification"),
("mobilebert", "TFMobileBertForSequenceClassification"), ("mobilebert", "TFMobileBertForSequenceClassification"),
("mpnet", "TFMPNetForSequenceClassification"), ("mpnet", "TFMPNetForSequenceClassification"),
...@@ -297,6 +299,7 @@ TF_MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict( ...@@ -297,6 +299,7 @@ TF_MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
("flaubert", "TFFlaubertForQuestionAnsweringSimple"), ("flaubert", "TFFlaubertForQuestionAnsweringSimple"),
("funnel", "TFFunnelForQuestionAnswering"), ("funnel", "TFFunnelForQuestionAnswering"),
("gptj", "TFGPTJForQuestionAnswering"), ("gptj", "TFGPTJForQuestionAnswering"),
("layoutlmv3", "TFLayoutLMv3ForQuestionAnswering"),
("longformer", "TFLongformerForQuestionAnswering"), ("longformer", "TFLongformerForQuestionAnswering"),
("mobilebert", "TFMobileBertForQuestionAnswering"), ("mobilebert", "TFMobileBertForQuestionAnswering"),
("mpnet", "TFMPNetForQuestionAnswering"), ("mpnet", "TFMPNetForQuestionAnswering"),
...@@ -316,7 +319,6 @@ TF_MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict( ...@@ -316,7 +319,6 @@ TF_MODEL_FOR_TABLE_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
] ]
) )
TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict( TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[ [
# Model for Token Classification mapping # Model for Token Classification mapping
...@@ -331,6 +333,7 @@ TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict( ...@@ -331,6 +333,7 @@ TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
("flaubert", "TFFlaubertForTokenClassification"), ("flaubert", "TFFlaubertForTokenClassification"),
("funnel", "TFFunnelForTokenClassification"), ("funnel", "TFFunnelForTokenClassification"),
("layoutlm", "TFLayoutLMForTokenClassification"), ("layoutlm", "TFLayoutLMForTokenClassification"),
("layoutlmv3", "TFLayoutLMv3ForTokenClassification"),
("longformer", "TFLongformerForTokenClassification"), ("longformer", "TFLongformerForTokenClassification"),
("mobilebert", "TFMobileBertForTokenClassification"), ("mobilebert", "TFMobileBertForTokenClassification"),
("mpnet", "TFMPNetForTokenClassification"), ("mpnet", "TFMPNetForTokenClassification"),
...@@ -373,7 +376,6 @@ TF_MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict( ...@@ -373,7 +376,6 @@ TF_MODEL_FOR_NEXT_SENTENCE_PREDICTION_MAPPING_NAMES = OrderedDict(
] ]
) )
TF_MODEL_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_MAPPING_NAMES) TF_MODEL_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_MAPPING_NAMES)
TF_MODEL_FOR_PRETRAINING_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_FOR_PRETRAINING_MAPPING_NAMES) TF_MODEL_FOR_PRETRAINING_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_FOR_PRETRAINING_MAPPING_NAMES)
TF_MODEL_WITH_LM_HEAD_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_WITH_LM_HEAD_MAPPING_NAMES) TF_MODEL_WITH_LM_HEAD_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TF_MODEL_WITH_LM_HEAD_MAPPING_NAMES)
......
...@@ -21,6 +21,7 @@ from typing import TYPE_CHECKING ...@@ -21,6 +21,7 @@ from typing import TYPE_CHECKING
from ...utils import ( from ...utils import (
OptionalDependencyNotAvailable, OptionalDependencyNotAvailable,
_LazyModule, _LazyModule,
is_tf_available,
is_tokenizers_available, is_tokenizers_available,
is_torch_available, is_torch_available,
is_vision_available, is_vision_available,
...@@ -60,6 +61,21 @@ else: ...@@ -60,6 +61,21 @@ else:
"LayoutLMv3PreTrainedModel", "LayoutLMv3PreTrainedModel",
] ]
try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_tf_layoutlmv3"] = [
"TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFLayoutLMv3ForQuestionAnswering",
"TFLayoutLMv3ForSequenceClassification",
"TFLayoutLMv3ForTokenClassification",
"TFLayoutLMv3Model",
"TFLayoutLMv3PreTrainedModel",
]
try: try:
if not is_vision_available(): if not is_vision_available():
raise OptionalDependencyNotAvailable() raise OptionalDependencyNotAvailable()
...@@ -101,6 +117,21 @@ if TYPE_CHECKING: ...@@ -101,6 +117,21 @@ if TYPE_CHECKING:
LayoutLMv3PreTrainedModel, LayoutLMv3PreTrainedModel,
) )
try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_tf_layoutlmv3 import (
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST,
TFLayoutLMv3ForQuestionAnswering,
TFLayoutLMv3ForSequenceClassification,
TFLayoutLMv3ForTokenClassification,
TFLayoutLMv3Model,
TFLayoutLMv3PreTrainedModel,
)
try: try:
if not is_vision_available(): if not is_vision_available():
raise OptionalDependencyNotAvailable() raise OptionalDependencyNotAvailable()
......
This diff is collapsed.
...@@ -1316,6 +1316,44 @@ class TFHubertPreTrainedModel(metaclass=DummyObject): ...@@ -1316,6 +1316,44 @@ class TFHubertPreTrainedModel(metaclass=DummyObject):
requires_backends(self, ["tf"]) requires_backends(self, ["tf"])
TF_LAYOUTLMV3_PRETRAINED_MODEL_ARCHIVE_LIST = None
class TFLayoutLMv3ForQuestionAnswering(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3ForSequenceClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3ForTokenClassification(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3Model(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLayoutLMv3PreTrainedModel(metaclass=DummyObject):
_backends = ["tf"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["tf"])
class TFLEDForConditionalGeneration(metaclass=DummyObject): class TFLEDForConditionalGeneration(metaclass=DummyObject):
_backends = ["tf"] _backends = ["tf"]
......
This diff is collapsed.
...@@ -38,6 +38,7 @@ src/transformers/models/gptj/modeling_gptj.py ...@@ -38,6 +38,7 @@ src/transformers/models/gptj/modeling_gptj.py
src/transformers/models/hubert/modeling_hubert.py src/transformers/models/hubert/modeling_hubert.py
src/transformers/models/layoutlmv2/modeling_layoutlmv2.py src/transformers/models/layoutlmv2/modeling_layoutlmv2.py
src/transformers/models/layoutlmv3/modeling_layoutlmv3.py src/transformers/models/layoutlmv3/modeling_layoutlmv3.py
src/transformers/models/layoutlmv3/modeling_tf_layoutlmv3.py
src/transformers/models/longformer/modeling_longformer.py src/transformers/models/longformer/modeling_longformer.py
src/transformers/models/longformer/modeling_tf_longformer.py src/transformers/models/longformer/modeling_tf_longformer.py
src/transformers/models/longt5/modeling_longt5.py src/transformers/models/longt5/modeling_longt5.py
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment