transformerxl.rst 5.59 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Transformer XL
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
15

Lysandre's avatar
Lysandre committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre's avatar
Lysandre committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
The Transformer-XL model was proposed in `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
<https://arxiv.org/abs/1901.02860>`__ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan
Salakhutdinov. It's a causal (uni-directional) transformer with relative positioning (sinuso茂dal) embeddings which can
reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax
inputs and outputs (tied).
Lysandre's avatar
Lysandre committed
24
25
26
27
28

The abstract from the paper is the following:

*Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the
setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency
Sylvain Gugger's avatar
Sylvain Gugger committed
29
30
31
32
33
34
35
beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a
novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the
context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450%
longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+
times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of
bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn
Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably
Lysandre's avatar
Lysandre committed
36
37
38
39
coherent, novel text articles with thousands of tokens.*

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
40
41
- Transformer-XL uses relative sinusoidal positional embeddings. Padding can be done on the left or on the right. The
  original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
Lysandre's avatar
Lysandre committed
42
- Transformer-XL is one of the few models that has no sequence length limit.
Lysandre's avatar
Lysandre committed
43

44
45
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/kimiyoung/transformer-xl>`__.
46

47
48
49
50
51
**Note**:

- TransformerXL does **not** work with `torch.nn.DataParallel` due to a bug in PyTorch, see `issue #36035
  <https://github.com/pytorch/pytorch/issues/36035>`__

52

Lysandre's avatar
Lysandre committed
53
TransfoXLConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
54
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55

56
.. autoclass:: transformers.TransfoXLConfig
57
    :members:
58
59


Lysandre's avatar
Lysandre committed
60
TransfoXLTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
61
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
62

63
.. autoclass:: transformers.TransfoXLTokenizer
Lysandre Debut's avatar
Lysandre Debut committed
64
    :members: save_vocabulary
65
66


67
TransfoXL specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
68
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
69

Sylvain Gugger's avatar
Sylvain Gugger committed
70
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLModelOutput
71
72
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
73
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLLMHeadModelOutput
74
75
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
76
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
77
78
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
79
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLLMHeadModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
80
81
    :members:

82

Lysandre's avatar
Lysandre committed
83
TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
84
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
85

86
.. autoclass:: transformers.TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
87
    :members: forward
88
89


Lysandre's avatar
Lysandre committed
90
TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
91
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
92

93
.. autoclass:: transformers.TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
94
    :members: forward
LysandreJik's avatar
LysandreJik committed
95

96

sandip's avatar
sandip committed
97
98
99
100
101
TransfoXLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TransfoXLForSequenceClassification
    :members: forward
LysandreJik's avatar
LysandreJik committed
102

103

Lysandre's avatar
Lysandre committed
104
TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
105
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
106

107
.. autoclass:: transformers.TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
108
    :members: call
LysandreJik's avatar
LysandreJik committed
109
110


Lysandre's avatar
Lysandre committed
111
TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
113

114
.. autoclass:: transformers.TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
115
    :members: call
116
117


118
119
120
121
122
123
124
TFTransfoXLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFTransfoXLForSequenceClassification
    :members: call


125
126
127
128
129
130
Internal Layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.AdaptiveEmbedding

.. autoclass:: transformers.TFAdaptiveEmbedding