transformerxl.rst 5.43 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Transformer XL
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
15

Lysandre's avatar
Lysandre committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre's avatar
Lysandre committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
The Transformer-XL model was proposed in `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
<https://arxiv.org/abs/1901.02860>`__ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan
Salakhutdinov. It's a causal (uni-directional) transformer with relative positioning (sinuso茂dal) embeddings which can
reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax
inputs and outputs (tied).
Lysandre's avatar
Lysandre committed
24
25
26
27
28

The abstract from the paper is the following:

*Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the
setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency
Sylvain Gugger's avatar
Sylvain Gugger committed
29
30
31
32
33
34
35
beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a
novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the
context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450%
longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+
times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of
bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn
Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably
Lysandre's avatar
Lysandre committed
36
37
38
39
coherent, novel text articles with thousands of tokens.*

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
40
41
- Transformer-XL uses relative sinusoidal positional embeddings. Padding can be done on the left or on the right. The
  original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
Lysandre's avatar
Lysandre committed
42
- Transformer-XL is one of the few models that has no sequence length limit.
Lysandre's avatar
Lysandre committed
43

44
45
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
<https://github.com/kimiyoung/transformer-xl>`__.
46

47

Lysandre's avatar
Lysandre committed
48
TransfoXLConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
49
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
50

51
.. autoclass:: transformers.TransfoXLConfig
52
    :members:
53
54


Lysandre's avatar
Lysandre committed
55
TransfoXLTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
56
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57

58
.. autoclass:: transformers.TransfoXLTokenizer
Lysandre Debut's avatar
Lysandre Debut committed
59
    :members: save_vocabulary
60
61


62
TransfoXL specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
63
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
64

Sylvain Gugger's avatar
Sylvain Gugger committed
65
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLModelOutput
66
67
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
68
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLLMHeadModelOutput
69
70
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
71
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
72
73
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
74
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLLMHeadModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
75
76
    :members:

77

Lysandre's avatar
Lysandre committed
78
TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
79
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80

81
.. autoclass:: transformers.TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
82
    :members: forward
83
84


Lysandre's avatar
Lysandre committed
85
TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
86
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
87

88
.. autoclass:: transformers.TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
89
    :members: forward
LysandreJik's avatar
LysandreJik committed
90

91

sandip's avatar
sandip committed
92
93
94
95
96
TransfoXLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TransfoXLForSequenceClassification
    :members: forward
LysandreJik's avatar
LysandreJik committed
97

98

Lysandre's avatar
Lysandre committed
99
TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
100
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
101

102
.. autoclass:: transformers.TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
103
    :members: call
LysandreJik's avatar
LysandreJik committed
104
105


Lysandre's avatar
Lysandre committed
106
TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
107
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
108

109
.. autoclass:: transformers.TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
110
    :members: call
111
112


113
114
115
116
117
118
119
TFTransfoXLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TFTransfoXLForSequenceClassification
    :members: call


120
121
122
123
124
125
Internal Layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.AdaptiveEmbedding

.. autoclass:: transformers.TFAdaptiveEmbedding