layoutlm.rst 3.71 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

Minghao Li's avatar
Minghao Li committed
13
LayoutLM
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
Minghao Li's avatar
Minghao Li committed
15
16

Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
The LayoutLM model was proposed in the paper `LayoutLM: Pre-training of Text and Layout for Document Image
Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and
21
Ming Zhou. It's a simple but effective pretraining method of text and layout for document image understanding and
Sylvain Gugger's avatar
Sylvain Gugger committed
22
information extraction tasks, such as form understanding and receipt understanding.
Minghao Li's avatar
Minghao Li committed
23
24
25

The abstract from the paper is the following:

Sylvain Gugger's avatar
Sylvain Gugger committed
26
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the
27
widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation,
Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
30
31
32
while neglecting layout and style information that is vital for document image understanding. In this paper, we propose
the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images,
which is beneficial for a great number of real-world document image understanding tasks such as information extraction
from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into
LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single
33
framework for document-level pretraining. It achieves new state-of-the-art results in several downstream tasks,
Sylvain Gugger's avatar
Sylvain Gugger committed
34
35
including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image
classification (from 93.07 to 94.42).*
Minghao Li's avatar
Minghao Li committed
36
37
38
39

Tips:

- LayoutLM has an extra input called :obj:`bbox`, which is the bounding boxes of the input tokens.
Sylvain Gugger's avatar
Sylvain Gugger committed
40
41
- The :obj:`bbox` requires the data that on 0-1000 scale, which means you should normalize the bounding box before
  passing them into model.
Minghao Li's avatar
Minghao Li committed
42
43
44
45
46

The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.


LayoutLMConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
47
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
48
49
50
51
52
53

.. autoclass:: transformers.LayoutLMConfig
    :members:


LayoutLMTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
54
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
55
56
57
58
59
60

.. autoclass:: transformers.LayoutLMTokenizer
    :members:


LayoutLMModel
Sylvain Gugger's avatar
Sylvain Gugger committed
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
62
63
64
65
66
67

.. autoclass:: transformers.LayoutLMModel
    :members:


LayoutLMForMaskedLM
Sylvain Gugger's avatar
Sylvain Gugger committed
68
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
69
70
71
72
73
74

.. autoclass:: transformers.LayoutLMForMaskedLM
    :members:


LayoutLMForTokenClassification
Sylvain Gugger's avatar
Sylvain Gugger committed
75
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Minghao Li's avatar
Minghao Li committed
76
77
78

.. autoclass:: transformers.LayoutLMForTokenClassification
    :members: