transformerxl.rst 4.88 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
11
12
.. 
    Copyright 2020 The HuggingFace Team. All rights reserved.

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    specific language governing permissions and limitations under the License.

13
Transformer XL
Sylvain Gugger's avatar
Sylvain Gugger committed
14
-----------------------------------------------------------------------------------------------------------------------
15

Lysandre's avatar
Lysandre committed
16
Overview
Sylvain Gugger's avatar
Sylvain Gugger committed
17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lysandre's avatar
Lysandre committed
18

Sylvain Gugger's avatar
Sylvain Gugger committed
19
20
21
22
23
The Transformer-XL model was proposed in `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
<https://arxiv.org/abs/1901.02860>`__ by Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan
Salakhutdinov. It's a causal (uni-directional) transformer with relative positioning (sinuso茂dal) embeddings which can
reuse previously computed hidden-states to attend to longer context (memory). This model also uses adaptive softmax
inputs and outputs (tied).
Lysandre's avatar
Lysandre committed
24
25
26
27
28

The abstract from the paper is the following:

*Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the
setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency
Sylvain Gugger's avatar
Sylvain Gugger committed
29
30
31
32
33
34
35
beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a
novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the
context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and 450%
longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up to 1,800+
times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results of
bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on Penn
Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably
Lysandre's avatar
Lysandre committed
36
37
38
39
coherent, novel text articles with thousands of tokens.*

Tips:

Sylvain Gugger's avatar
Sylvain Gugger committed
40
41
- Transformer-XL uses relative sinusoidal positional embeddings. Padding can be done on the left or on the right. The
  original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
Lysandre's avatar
Lysandre committed
42
- Transformer-XL is one of the few models that has no sequence length limit.
Lysandre's avatar
Lysandre committed
43

Sylvain Gugger's avatar
Sylvain Gugger committed
44
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`__.
45

46

Lysandre's avatar
Lysandre committed
47
TransfoXLConfig
Sylvain Gugger's avatar
Sylvain Gugger committed
48
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49

50
.. autoclass:: transformers.TransfoXLConfig
51
    :members:
52
53


Lysandre's avatar
Lysandre committed
54
TransfoXLTokenizer
Sylvain Gugger's avatar
Sylvain Gugger committed
55
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56

57
.. autoclass:: transformers.TransfoXLTokenizer
Lysandre Debut's avatar
Lysandre Debut committed
58
    :members: save_vocabulary
59
60


61
TransfoXL specific outputs
Sylvain Gugger's avatar
Sylvain Gugger committed
62
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
63

Sylvain Gugger's avatar
Sylvain Gugger committed
64
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLModelOutput
65
66
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
67
.. autoclass:: transformers.models.transfo_xl.modeling_transfo_xl.TransfoXLLMHeadModelOutput
68
69
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
70
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
71
72
    :members:

Sylvain Gugger's avatar
Sylvain Gugger committed
73
.. autoclass:: transformers.models.transfo_xl.modeling_tf_transfo_xl.TFTransfoXLLMHeadModelOutput
Sylvain Gugger's avatar
Sylvain Gugger committed
74
75
    :members:

76

Lysandre's avatar
Lysandre committed
77
TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
78
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79

80
.. autoclass:: transformers.TransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
81
    :members: forward
82
83


Lysandre's avatar
Lysandre committed
84
TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
85
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
86

87
.. autoclass:: transformers.TransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
88
    :members: forward
LysandreJik's avatar
LysandreJik committed
89

sandip's avatar
sandip committed
90
91
92
93
94
TransfoXLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.TransfoXLForSequenceClassification
    :members: forward
LysandreJik's avatar
LysandreJik committed
95

Lysandre's avatar
Lysandre committed
96
TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
97
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
98

99
.. autoclass:: transformers.TFTransfoXLModel
Sylvain Gugger's avatar
Sylvain Gugger committed
100
    :members: call
LysandreJik's avatar
LysandreJik committed
101
102


Lysandre's avatar
Lysandre committed
103
TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
104
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LysandreJik's avatar
LysandreJik committed
105

106
.. autoclass:: transformers.TFTransfoXLLMHeadModel
Sylvain Gugger's avatar
Sylvain Gugger committed
107
    :members: call