Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
8fe2c9d9
Commit
8fe2c9d9
authored
Jul 09, 2019
by
LysandreJik
Browse files
Refactored Docstrings of BERT, GPT2, GPT, TransfoXL, XLM and XLNet.
parent
ed6c8d37
Changes
13
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
13 changed files
with
934 additions
and
773 deletions
+934
-773
docs/source/cli.rst
docs/source/cli.rst
+5
-5
docs/source/model_doc/bert.rst
docs/source/model_doc/bert.rst
+11
-11
docs/source/model_doc/gpt.rst
docs/source/model_doc/gpt.rst
+6
-6
docs/source/model_doc/gpt2.rst
docs/source/model_doc/gpt2.rst
+5
-5
docs/source/model_doc/transformerxl.rst
docs/source/model_doc/transformerxl.rst
+4
-4
docs/source/model_doc/xlm.rst
docs/source/model_doc/xlm.rst
+32
-1
docs/source/usage.rst
docs/source/usage.rst
+4
-4
pytorch_transformers/modeling_bert.py
pytorch_transformers/modeling_bert.py
+91
-63
pytorch_transformers/modeling_gpt2.py
pytorch_transformers/modeling_gpt2.py
+184
-122
pytorch_transformers/modeling_openai.py
pytorch_transformers/modeling_openai.py
+198
-137
pytorch_transformers/modeling_transfo_xl.py
pytorch_transformers/modeling_transfo_xl.py
+127
-128
pytorch_transformers/modeling_xlm.py
pytorch_transformers/modeling_xlm.py
+257
-277
pytorch_transformers/modeling_xlnet.py
pytorch_transformers/modeling_xlnet.py
+10
-10
No files found.
docs/source/cli.rst
View file @
8fe2c9d9
...
@@ -20,7 +20,7 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
...
@@ -20,7 +20,7 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
pytorch_
pretrained_b
er
t
bert \
pytorch_
transform
er
s
bert \
$BERT_BASE_DIR/bert_model.ckpt \
$BERT_BASE_DIR/bert_model.ckpt \
$BERT_BASE_DIR/bert_config.json \
$BERT_BASE_DIR/bert_config.json \
$BERT_BASE_DIR/pytorch_model.bin
$BERT_BASE_DIR/pytorch_model.bin
...
@@ -36,7 +36,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
...
@@ -36,7 +36,7 @@ Here is an example of the conversion process for a pre-trained OpenAI GPT model,
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
pytorch_
pretrained_b
er
t
gpt \
pytorch_
transform
er
s
gpt \
$OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
$OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
$PYTORCH_DUMP_OUTPUT \
$PYTORCH_DUMP_OUTPUT \
[OPENAI_GPT_CONFIG]
[OPENAI_GPT_CONFIG]
...
@@ -50,7 +50,7 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
...
@@ -50,7 +50,7 @@ Here is an example of the conversion process for a pre-trained Transformer-XL mo
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
pytorch_
pretrained_b
er
t
transfo_xl \
pytorch_
transform
er
s
transfo_xl \
$TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
$TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
$PYTORCH_DUMP_OUTPUT \
$PYTORCH_DUMP_OUTPUT \
[TRANSFO_XL_CONFIG]
[TRANSFO_XL_CONFIG]
...
@@ -64,7 +64,7 @@ Here is an example of the conversion process for a pre-trained OpenAI's GPT-2 mo
...
@@ -64,7 +64,7 @@ Here is an example of the conversion process for a pre-trained OpenAI's GPT-2 mo
export GPT2_DIR=/path/to/gpt2/checkpoint
export GPT2_DIR=/path/to/gpt2/checkpoint
pytorch_
pretrained_b
er
t
gpt2 \
pytorch_
transform
er
s
gpt2 \
$GPT2_DIR/model.ckpt \
$GPT2_DIR/model.ckpt \
$PYTORCH_DUMP_OUTPUT \
$PYTORCH_DUMP_OUTPUT \
[GPT2_CONFIG]
[GPT2_CONFIG]
...
@@ -79,7 +79,7 @@ Here is an example of the conversion process for a pre-trained XLNet model, fine
...
@@ -79,7 +79,7 @@ Here is an example of the conversion process for a pre-trained XLNet model, fine
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
pytorch_
pretrained_b
er
t
xlnet \
pytorch_
transform
er
s
xlnet \
$TRANSFO_XL_CHECKPOINT_PATH \
$TRANSFO_XL_CHECKPOINT_PATH \
$TRANSFO_XL_CONFIG_PATH \
$TRANSFO_XL_CONFIG_PATH \
$PYTORCH_DUMP_OUTPUT \
$PYTORCH_DUMP_OUTPUT \
...
...
docs/source/model_doc/bert.rst
View file @
8fe2c9d9
...
@@ -4,75 +4,75 @@ BERT
...
@@ -4,75 +4,75 @@ BERT
``BertConfig``
``BertConfig``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertConfig
.. autoclass:: pytorch_
transform
er
s
.BertConfig
:members:
:members:
``BertTokenizer``
``BertTokenizer``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertTokenizer
.. autoclass:: pytorch_
transform
er
s
.BertTokenizer
:members:
:members:
``BertAdam``
``BertAdam``
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertAdam
.. autoclass:: pytorch_
transform
er
s
.BertAdam
:members:
:members:
1. ``BertModel``
1. ``BertModel``
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertModel
.. autoclass:: pytorch_
transform
er
s
.BertModel
:members:
:members:
2. ``BertForPreTraining``
2. ``BertForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForPreTraining
.. autoclass:: pytorch_
transform
er
s
.BertForPreTraining
:members:
:members:
3. ``BertForMaskedLM``
3. ``BertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForMaskedLM
.. autoclass:: pytorch_
transform
er
s
.BertForMaskedLM
:members:
:members:
4. ``BertForNextSentencePrediction``
4. ``BertForNextSentencePrediction``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForNextSentencePrediction
.. autoclass:: pytorch_
transform
er
s
.BertForNextSentencePrediction
:members:
:members:
5. ``BertForSequenceClassification``
5. ``BertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForSequenceClassification
.. autoclass:: pytorch_
transform
er
s
.BertForSequenceClassification
:members:
:members:
6. ``BertForMultipleChoice``
6. ``BertForMultipleChoice``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForMultipleChoice
.. autoclass:: pytorch_
transform
er
s
.BertForMultipleChoice
:members:
:members:
7. ``BertForTokenClassification``
7. ``BertForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForTokenClassification
.. autoclass:: pytorch_
transform
er
s
.BertForTokenClassification
:members:
:members:
8. ``BertForQuestionAnswering``
8. ``BertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.BertForQuestionAnswering
.. autoclass:: pytorch_
transform
er
s
.BertForQuestionAnswering
:members:
:members:
docs/source/model_doc/gpt.rst
View file @
8fe2c9d9
...
@@ -4,40 +4,40 @@ OpenAI GPT
...
@@ -4,40 +4,40 @@ OpenAI GPT
``OpenAIGPTConfig``
``OpenAIGPTConfig``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIGPTConfig
.. autoclass:: pytorch_
transform
er
s
.OpenAIGPTConfig
:members:
:members:
``OpenAIGPTTokenizer``
``OpenAIGPTTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIGPTTokenizer
.. autoclass:: pytorch_
transform
er
s
.OpenAIGPTTokenizer
:members:
:members:
``OpenAIAdam``
``OpenAIAdam``
~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIAdam
.. autoclass:: pytorch_
transform
er
s
.OpenAIAdam
:members:
:members:
9. ``OpenAIGPTModel``
9. ``OpenAIGPTModel``
~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIGPTModel
.. autoclass:: pytorch_
transform
er
s
.OpenAIGPTModel
:members:
:members:
10. ``OpenAIGPTLMHeadModel``
10. ``OpenAIGPTLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIGPTLMHeadModel
.. autoclass:: pytorch_
transform
er
s
.OpenAIGPTLMHeadModel
:members:
:members:
11. ``OpenAIGPTDoubleHeadsModel``
11. ``OpenAIGPTDoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.OpenAIGPTDoubleHeadsModel
.. autoclass:: pytorch_
transform
er
s
.OpenAIGPTDoubleHeadsModel
:members:
:members:
docs/source/model_doc/gpt2.rst
View file @
8fe2c9d9
...
@@ -4,33 +4,33 @@ OpenAI GPT2
...
@@ -4,33 +4,33 @@ OpenAI GPT2
``GPT2Config``
``GPT2Config``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.GPT2Config
.. autoclass:: pytorch_
transform
er
s
.GPT2Config
:members:
:members:
``GPT2Tokenizer``
``GPT2Tokenizer``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.GPT2Tokenizer
.. autoclass:: pytorch_
transform
er
s
.GPT2Tokenizer
:members:
:members:
14. ``GPT2Model``
14. ``GPT2Model``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.GPT2Model
.. autoclass:: pytorch_
transform
er
s
.GPT2Model
:members:
:members:
15. ``GPT2LMHeadModel``
15. ``GPT2LMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.GPT2LMHeadModel
.. autoclass:: pytorch_
transform
er
s
.GPT2LMHeadModel
:members:
:members:
16. ``GPT2DoubleHeadsModel``
16. ``GPT2DoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.GPT2DoubleHeadsModel
.. autoclass:: pytorch_
transform
er
s
.GPT2DoubleHeadsModel
:members:
:members:
docs/source/model_doc/transformerxl.rst
View file @
8fe2c9d9
...
@@ -5,26 +5,26 @@ Transformer XL
...
@@ -5,26 +5,26 @@ Transformer XL
``TransfoXLConfig``
``TransfoXLConfig``
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.TransfoXLConfig
.. autoclass:: pytorch_
transform
er
s
.TransfoXLConfig
:members:
:members:
``TransfoXLTokenizer``
``TransfoXLTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.TransfoXLTokenizer
.. autoclass:: pytorch_
transform
er
s
.TransfoXLTokenizer
:members:
:members:
12. ``TransfoXLModel``
12. ``TransfoXLModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.TransfoXLModel
.. autoclass:: pytorch_
transform
er
s
.TransfoXLModel
:members:
:members:
13. ``TransfoXLLMHeadModel``
13. ``TransfoXLLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_
pretrained_b
er
t
.TransfoXLLMHeadModel
.. autoclass:: pytorch_
transform
er
s
.TransfoXLLMHeadModel
:members:
:members:
docs/source/model_doc/xlm.rst
View file @
8fe2c9d9
XLM
XLM
----------------------------------------------------
----------------------------------------------------
``XLMConfig``
~~~~~~~~~~~~~~~~~~~~~
I don't really know what to put here, I'll leave it up to you to decide @Thom
.. autoclass:: pytorch_transformers.TransfoXLConfig
\ No newline at end of file
:members:
17. ``XLMModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.XLMModel
:members:
18. ``XLMWithLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.XLMWithLMHeadModel
:members:
19. ``XLMForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.XLMForSequenceClassification
:members:
20. ``XLMForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.XLMForQuestionAnswering
:members:
docs/source/usage.rst
View file @
8fe2c9d9
...
@@ -11,7 +11,7 @@ First let's prepare a tokenized input with ``BertTokenizer``
...
@@ -11,7 +11,7 @@ First let's prepare a tokenized input with ``BertTokenizer``
..
code
-
block
::
python
..
code
-
block
::
python
import
torch
import
torch
from
pytorch_
pretrained_b
er
t
import
BertTokenizer
,
BertModel
,
BertForMaskedLM
from
pytorch_
transform
er
s
import
BertTokenizer
,
BertModel
,
BertForMaskedLM
#
OPTIONAL
:
if
you
want
to
have
more
information
on
what
's happening, activate the logger as follows
#
OPTIONAL
:
if
you
want
to
have
more
information
on
what
's happening, activate the logger as follows
import logging
import logging
...
@@ -89,7 +89,7 @@ First let's prepare a tokenized input with ``OpenAIGPTTokenizer``
...
@@ -89,7 +89,7 @@ First let's prepare a tokenized input with ``OpenAIGPTTokenizer``
..
code
-
block
::
python
..
code
-
block
::
python
import
torch
import
torch
from
pytorch_
pretrained_b
er
t
import
OpenAIGPTTokenizer
,
OpenAIGPTModel
,
OpenAIGPTLMHeadModel
from
pytorch_
transform
er
s
import
OpenAIGPTTokenizer
,
OpenAIGPTModel
,
OpenAIGPTLMHeadModel
#
OPTIONAL
:
if
you
want
to
have
more
information
on
what
's happening, activate the logger as follows
#
OPTIONAL
:
if
you
want
to
have
more
information
on
what
's happening, activate the logger as follows
import logging
import logging
...
@@ -177,7 +177,7 @@ First let's prepare a tokenized input with ``TransfoXLTokenizer``
...
@@ -177,7 +177,7 @@ First let's prepare a tokenized input with ``TransfoXLTokenizer``
.. code-block:: python
.. code-block:: python
import torch
import torch
from pytorch_
pretrained_b
er
t
import TransfoXLTokenizer, TransfoXLModel, TransfoXLLMHeadModel
from pytorch_
transform
er
s
import TransfoXLTokenizer, TransfoXLModel, TransfoXLLMHeadModel
# OPTIONAL: if you want to have more information on what'
s
happening
,
activate
the
logger
as
follows
# OPTIONAL: if you want to have more information on what'
s
happening
,
activate
the
logger
as
follows
import
logging
import
logging
...
@@ -253,7 +253,7 @@ First let's prepare a tokenized input with ``GPT2Tokenizer``
...
@@ -253,7 +253,7 @@ First let's prepare a tokenized input with ``GPT2Tokenizer``
.. code-block:: python
.. code-block:: python
import torch
import torch
from pytorch_
pretrained_b
er
t
import GPT2Tokenizer, GPT2Model, GPT2LMHeadModel
from pytorch_
transform
er
s
import GPT2Tokenizer, GPT2Model, GPT2LMHeadModel
# OPTIONAL: if you want to have more information on what'
s
happening
,
activate
the
logger
as
follows
# OPTIONAL: if you want to have more information on what'
s
happening
,
activate
the
logger
as
follows
import
logging
import
logging
...
...
pytorch_transformers/modeling_bert.py
View file @
8fe2c9d9
This diff is collapsed.
Click to expand it.
pytorch_transformers/modeling_gpt2.py
View file @
8fe2c9d9
This diff is collapsed.
Click to expand it.
pytorch_transformers/modeling_openai.py
View file @
8fe2c9d9
This diff is collapsed.
Click to expand it.
pytorch_transformers/modeling_transfo_xl.py
View file @
8fe2c9d9
...
@@ -177,6 +177,38 @@ def load_tf_weights_in_transfo_xl(model, config, tf_path):
...
@@ -177,6 +177,38 @@ def load_tf_weights_in_transfo_xl(model, config, tf_path):
class
TransfoXLConfig
(
PretrainedConfig
):
class
TransfoXLConfig
(
PretrainedConfig
):
"""Configuration class to store the configuration of a `TransfoXLModel`.
"""Configuration class to store the configuration of a `TransfoXLModel`.
Args:
vocab_size_or_config_json_file: Vocabulary size of `inputs_ids` in `TransfoXLModel` or a configuration json file.
cutoffs: cutoffs for the adaptive softmax
d_model: Dimensionality of the model's hidden states.
d_embed: Dimensionality of the embeddings
d_head: Dimensionality of the model's heads.
div_val: divident value for adapative input and softmax
pre_lnorm: apply LayerNorm to the input instead of the output
d_inner: Inner dimension in FF
n_layer: Number of hidden layers in the Transformer encoder.
n_head: Number of attention heads for each attention layer in
the Transformer encoder.
tgt_len: number of tokens to predict
ext_len: length of the extended context
mem_len: length of the retained previous heads
same_length: use the same attn length for all tokens
proj_share_all_but_first: True to share all but first projs, False not to share.
attn_type: attention type. 0 for Transformer-XL, 1 for Shaw et al, 2 for Vaswani et al, 3 for Al Rfou et al.
clamp_len: use the same pos embeddings after clamp_len
sample_softmax: number of samples in sampled softmax
adaptive: use adaptive softmax
tie_weight: tie the word embedding and softmax weights
dropout: The dropout probabilitiy for all fully connected
layers in the embeddings, encoder, and pooler.
dropatt: The dropout ratio for the attention probabilities.
untie_r: untie relative position biases
embd_pdrop: The dropout ratio for the embeddings.
init: parameter initializer to use
init_range: parameters initialized by U(-init_range, init_range).
proj_init_std: parameters initialized by N(0, init_std)
init_std: parameters initialized by N(0, init_std)
"""
"""
pretrained_config_archive_map
=
TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP
pretrained_config_archive_map
=
TRANSFO_XL_PRETRAINED_CONFIG_ARCHIVE_MAP
...
@@ -210,38 +242,6 @@ class TransfoXLConfig(PretrainedConfig):
...
@@ -210,38 +242,6 @@ class TransfoXLConfig(PretrainedConfig):
init_std
=
0.02
,
init_std
=
0.02
,
**
kwargs
):
**
kwargs
):
"""Constructs TransfoXLConfig.
"""Constructs TransfoXLConfig.
Args:
vocab_size_or_config_json_file: Vocabulary size of `inputs_ids` in `TransfoXLModel` or a configuration json file.
cutoffs: cutoffs for the adaptive softmax
d_model: Dimensionality of the model's hidden states.
d_embed: Dimensionality of the embeddings
d_head: Dimensionality of the model's heads.
div_val: divident value for adapative input and softmax
pre_lnorm: apply LayerNorm to the input instead of the output
d_inner: Inner dimension in FF
n_layer: Number of hidden layers in the Transformer encoder.
n_head: Number of attention heads for each attention layer in
the Transformer encoder.
tgt_len: number of tokens to predict
ext_len: length of the extended context
mem_len: length of the retained previous heads
same_length: use the same attn length for all tokens
proj_share_all_but_first: True to share all but first projs, False not to share.
attn_type: attention type. 0 for Transformer-XL, 1 for Shaw et al, 2 for Vaswani et al, 3 for Al Rfou et al.
clamp_len: use the same pos embeddings after clamp_len
sample_softmax: number of samples in sampled softmax
adaptive: use adaptive softmax
tie_weight: tie the word embedding and softmax weights
dropout: The dropout probabilitiy for all fully connected
layers in the embeddings, encoder, and pooler.
dropatt: The dropout ratio for the attention probabilities.
untie_r: untie relative position biases
embd_pdrop: The dropout ratio for the embeddings.
init: parameter initializer to use
init_range: parameters initialized by U(-init_range, init_range).
proj_init_std: parameters initialized by N(0, init_std)
init_std: parameters initialized by N(0, init_std)
"""
"""
super
(
TransfoXLConfig
,
self
).
__init__
(
**
kwargs
)
super
(
TransfoXLConfig
,
self
).
__init__
(
**
kwargs
)
...
@@ -901,42 +901,20 @@ class TransfoXLPreTrainedModel(PreTrainedModel):
...
@@ -901,42 +901,20 @@ class TransfoXLPreTrainedModel(PreTrainedModel):
class
TransfoXLModel
(
TransfoXLPreTrainedModel
):
class
TransfoXLModel
(
TransfoXLPreTrainedModel
):
"""Transformer XL model ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context").
"""Transformer XL model ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context").
Transformer XL use
a
relative positioning (with sinusiodal patterns) and adaptive softmax inputs which means that:
Transformer XL use
s
relative positioning (with sinusiodal patterns) and adaptive softmax inputs which means that:
- you don't need to specify positioning embeddings indices
- the tokens in the vocabulary have to be sorted to decreasing frequency
.
- you don't need to specify positioning embeddings indices
.
Params:
- the tokens in the vocabulary have to be sorted in decreasing frequency.
Args:
config: a TransfoXLConfig class instance with the configuration to build a new model
config: a TransfoXLConfig class instance with the configuration to build a new model
Inputs:
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
Example::
with the token indices selected in the range [0, self.config.n_token[
`mems`: optional memomry of hidden states from previous forward passes
config = TransfoXLConfig()
as a list (num layers) of hidden states at the entry of each layer
model = TransfoXLModel(config)
each hidden states has shape [self.config.mem_len, bsz, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Outputs:
A tuple of (last_hidden_state, new_mems)
`last_hidden_state`: the encoded-hidden-states at the top of the model
as a torch.FloatTensor of size [batch_size, sequence_length, self.config.d_model]
`new_mems`: list (num layers) of updated mem states at the entry of each layer
each mem state is a torch.FloatTensor of size [self.config.mem_len, batch_size, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Example usage:
```python
# Already been converted into BPE token ids
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
input_ids_next = torch.LongTensor([[53, 21, 1], [64, 23, 100]])
config = TransfoXLConfig()
model = TransfoXLModel(config)
last_hidden_state, new_mems = model(input_ids)
# Another time on input_ids_next using the memory:
last_hidden_state, new_mems = model(input_ids_next, new_mems)
```
"""
"""
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
super
(
TransfoXLModel
,
self
).
__init__
(
config
)
super
(
TransfoXLModel
,
self
).
__init__
(
config
)
...
@@ -1200,18 +1178,40 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
...
@@ -1200,18 +1178,40 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
return
outputs
# last hidden state, new_mems, (all hidden states), (all attentions)
return
outputs
# last hidden state, new_mems, (all hidden states), (all attentions)
def
forward
(
self
,
input_ids
,
mems
=
None
,
head_mask
=
None
):
def
forward
(
self
,
input_ids
,
mems
=
None
,
head_mask
=
None
):
""" Params:
"""
input_ids :: [bsz, len]
Performs a model forward pass. **Can be called by calling the class directly, once it has been instantiated.**
mems :: optional mems from previous forwar passes (or init_mems)
list (num layers) of mem states at the entry of each layer
Args:
shape :: [self.config.mem_len, bsz, self.config.d_model]
`input_ids`: a ``torch.LongTensor`` of shape [batch_size, sequence_length]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
with the token indices selected in the range [0, self.config.n_token[
Returns:
`mems`: optional memory of hidden states from previous forward passes
tuple (last_hidden, new_mems) where:
as a list (num layers) of hidden states at the entry of each layer
new_mems: list (num layers) of mem states at the entry of each layer
each hidden states has shape [self.config.mem_len, bsz, self.config.d_model]
shape :: [self.config.mem_len, bsz, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
last_hidden: output of the last layer:
shape :: [bsz, len, self.config.d_model]
Returns:
A tuple of ``(last_hidden_state, new_mems)``.
``last_hidden_state``: the encoded-hidden-states at the top of the model
as a ``torch.FloatTensor`` of size [batch_size, sequence_length, self.config.d_model]
``new_mems``: list (num layers) of updated mem states at the entry of each layer
each mem state is a ``torch.FloatTensor`` of size [self.config.mem_len, batch_size, self.config.d_model]
Note that the first two dimensions are transposed in ``mems`` with regards to ``input_ids`` and
``labels``
Example::
# Already been converted into BPE token ids
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
input_ids_next = torch.LongTensor([[53, 21, 1], [64, 23, 100]])
last_hidden_state, new_mems = model(input_ids)
# or
last_hidden_state, new_mems = model.forward(input_ids)
# Another time on input_ids_next using the memory:
last_hidden_state, new_mems = model(input_ids_next, new_mems)
"""
"""
# the original code for Transformer-XL used shapes [len, bsz] but we want a unified interface in the library
# the original code for Transformer-XL used shapes [len, bsz] but we want a unified interface in the library
# so we transpose here from shape [bsz, len] to shape [len, bsz]
# so we transpose here from shape [bsz, len] to shape [len, bsz]
...
@@ -1227,52 +1227,24 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
...
@@ -1227,52 +1227,24 @@ class TransfoXLModel(TransfoXLPreTrainedModel):
class
TransfoXLLMHeadModel
(
TransfoXLPreTrainedModel
):
class
TransfoXLLMHeadModel
(
TransfoXLPreTrainedModel
):
"""Transformer XL model ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context").
"""Transformer XL model ("Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context").
This model add an (adaptive) softmax head on top of the TransfoXLModel
This model add
s
an (adaptive) softmax head on top of the
``
TransfoXLModel
``
Transformer XL use a relative positioning (with sinusiodal patterns) and adaptive softmax inputs which means that:
Transformer XL uses a relative positioning (with sinusoidal patterns) and adaptive softmax inputs which means that:
- you don't need to specify positioning embeddings indices
- the tokens in the vocabulary have to be sorted to decreasing frequency.
Call self.tie_weights() if you update/load the weights of the transformer to keep the weights tied.
- you don't need to specify positioning embeddings indices
Params:
- the tokens in the vocabulary have to be sorted in decreasing frequency.
config: a TransfoXLConfig class instance with the configuration to build a new model
Inputs:
Call ``self.tie_weights()`` if you update/load the weights of the transformer to keep the weights tied.
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
with the token indices selected in the range [0, self.config.n_token[
`labels`: an optional torch.LongTensor of shape [batch_size, sequence_length]
with the labels token indices selected in the range [0, self.config.n_token[
`mems`: an optional memory of hidden states from previous forward passes
as a list (num layers) of hidden states at the entry of each layer
each hidden states has shape [self.config.mem_len, bsz, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Outputs:
A tuple of (last_hidden_state, new_mems)
`softmax_output`: output of the (adaptive) softmax:
if labels is None:
Negative log likelihood of shape [batch_size, sequence_length]
else:
log probabilities of tokens, shape [batch_size, sequence_length, n_tokens]
`new_mems`: list (num layers) of updated mem states at the entry of each layer
each mem state is a torch.FloatTensor of size [self.config.mem_len, batch_size, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Example usage:
Args:
```python
config: a ``TransfoXLConfig`` class instance with the configuration to build a new model
# Already been converted into BPE token ids
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
input_ids_next = torch.LongTensor([[53, 21, 1], [64, 23, 100]])
config = TransfoXLConfig()
model = TransfoXLModel(config)
Example::
last_hidden_state, new_mems = model(input_ids)
# Another time on input_ids_next using the memory:
config = TransfoXLConfig()
last_hidden_state, new_mems = model(input_ids_next, mems=new_mems)
model = TransfoXLModel(config)
```
"""
"""
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
super
(
TransfoXLLMHeadModel
,
self
).
__init__
(
config
)
super
(
TransfoXLLMHeadModel
,
self
).
__init__
(
config
)
...
@@ -1290,7 +1262,9 @@ class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
...
@@ -1290,7 +1262,9 @@ class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
self
.
tie_weights
()
self
.
tie_weights
()
def
tie_weights
(
self
):
def
tie_weights
(
self
):
""" Run this to be sure output and input (adaptive) softmax weights are tied """
"""
Run this to be sure output and input (adaptive) softmax weights are tied
"""
# sampled softmax
# sampled softmax
if
self
.
sample_softmax
>
0
:
if
self
.
sample_softmax
>
0
:
if
self
.
config
.
tie_weight
:
if
self
.
config
.
tie_weight
:
...
@@ -1314,18 +1288,43 @@ class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
...
@@ -1314,18 +1288,43 @@ class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
return
self
.
transformer
.
init_mems
(
data
)
return
self
.
transformer
.
init_mems
(
data
)
def
forward
(
self
,
input_ids
,
labels
=
None
,
mems
=
None
,
head_mask
=
None
):
def
forward
(
self
,
input_ids
,
labels
=
None
,
mems
=
None
,
head_mask
=
None
):
""" Params:
"""
input_ids :: [bsz, len]
Performs a model forward pass. **Can be called by calling the class directly, once it has been instantiated.**
labels :: [bsz, len]
Returns:
Args:
tuple(softmax_output, new_mems) where:
`input_ids`: a ``torch.LongTensor`` of shape [batch_size, sequence_length]
new_mems: list (num layers) of hidden states at the entry of each layer
with the token indices selected in the range [0, self.config.n_token[
shape :: [mem_len, bsz, self.config.d_model] :: Warning: shapes are transposed here w. regards to input_ids
`labels`: an optional ``torch.LongTensor`` of shape [batch_size, sequence_length]
softmax_output: output of the (adaptive) softmax:
with the labels token indices selected in the range [0, self.config.n_token[
if labels is None:
`mems`: an optional memory of hidden states from previous forward passes
Negative log likelihood of shape :: [bsz, len]
as a list (num layers) of hidden states at the entry of each layer
else:
each hidden states has shape [self.config.mem_len, bsz, self.config.d_model]
log probabilities of tokens, shape :: [bsz, len, n_tokens]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Returns:
A tuple of (last_hidden_state, new_mems)
``last_hidden_state``: output of the (adaptive) softmax. If ``labels`` is ``None``, it is the negative
log likelihood of shape [batch_size, sequence_length]. Otherwise, it is the log probabilities of
tokens of, shape [batch_size, sequence_length, n_tokens].
``new_mems``: list (num layers) of updated mem states at the entry of each layer
each mem state is a ``torch.FloatTensor`` of size [self.config.mem_len, batch_size, self.config.d_model]
Note that the first two dimensions are transposed in ``mems`` with regards to ``input_ids`` and
``labels``
Example::
# Already been converted into BPE token ids
input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
input_ids_next = torch.LongTensor([[53, 21, 1], [64, 23, 100]])
last_hidden_state, new_mems = model(input_ids)
# or
last_hidden_state, new_mems = model.forward(input_ids)
# Another time on input_ids_next using the memory:
last_hidden_state, new_mems = model(input_ids_next, mems=new_mems)
"""
"""
bsz
=
input_ids
.
size
(
0
)
bsz
=
input_ids
.
size
(
0
)
tgt_len
=
input_ids
.
size
(
1
)
tgt_len
=
input_ids
.
size
(
1
)
...
...
pytorch_transformers/modeling_xlm.py
View file @
8fe2c9d9
This diff is collapsed.
Click to expand it.
pytorch_transformers/modeling_xlnet.py
View file @
8fe2c9d9
...
@@ -958,10 +958,10 @@ class XLNetLMHeadModel(XLNetPreTrainedModel):
...
@@ -958,10 +958,10 @@ class XLNetLMHeadModel(XLNetPreTrainedModel):
`encoded_layers`: controled by `output_all_encoded_layers` argument:
`encoded_layers`: controled by `output_all_encoded_layers` argument:
- `output_all_encoded_layers=True`: outputs a list of the full sequences of encoded-hidden-states at the end
- `output_all_encoded_layers=True`: outputs a list of the full sequences of encoded-hidden-states at the end
of each attention block (i.e. 12 full sequences for XLNet-base, 24 for XLNet-large), each
of each attention block (i.e. 12 full sequences for XLNet-base, 24 for XLNet-large), each
encoded-hidden-state is a torch.FloatTensor of size [batch_size, sequence_length, d_model],
encoded-hidden-state is a
``
torch.FloatTensor
``
of size [batch_size, sequence_length, d_model],
- `output_all_encoded_layers=False`: outputs only the full sequence of hidden-states corresponding
- `output_all_encoded_layers=False`: outputs only the full sequence of hidden-states corresponding
to the last attention block of shape [batch_size, sequence_length, d_model],
to the last attention block of shape [batch_size, sequence_length, d_model],
`pooled_output`: a torch.FloatTensor of size [batch_size, d_model] which is the output of a
`pooled_output`: a
``
torch.FloatTensor
``
of size [batch_size, d_model] which is the output of a
classifier pretrained on top of the hidden state associated to the first character of the
classifier pretrained on top of the hidden state associated to the first character of the
input (`CLS`) to train on the Next-Sentence task (see XLNet's paper).
input (`CLS`) to train on the Next-Sentence task (see XLNet's paper).
...
@@ -1087,7 +1087,7 @@ class XLNetForSequenceClassification(XLNetPreTrainedModel):
...
@@ -1087,7 +1087,7 @@ class XLNetForSequenceClassification(XLNetPreTrainedModel):
1 for tokens with losses and 0 for tokens without losses.
1 for tokens with losses and 0 for tokens without losses.
Only used during pretraining for two-stream attention.
Only used during pretraining for two-stream attention.
Set to None during finetuning.
Set to None during finetuning.
`head_mask`: an optional torch.Tensor of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
`head_mask`: an optional
``
torch.Tensor
``
of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
...
@@ -1098,7 +1098,7 @@ class XLNetForSequenceClassification(XLNetPreTrainedModel):
...
@@ -1098,7 +1098,7 @@ class XLNetForSequenceClassification(XLNetPreTrainedModel):
else:
else:
CrossEntropy loss with the targets
CrossEntropy loss with the targets
`new_mems`: list (num layers) of updated mem states at the entry of each layer
`new_mems`: list (num layers) of updated mem states at the entry of each layer
each mem state is a torch.FloatTensor of size [self.config.mem_len, batch_size, self.config.d_model]
each mem state is a
``
torch.FloatTensor
``
of size [self.config.mem_len, batch_size, self.config.d_model]
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Note that the first two dimensions are transposed in `mems` with regards to `input_ids` and `labels`
Example usage:
Example usage:
...
@@ -1189,27 +1189,27 @@ class XLNetForQuestionAnswering(XLNetPreTrainedModel):
...
@@ -1189,27 +1189,27 @@ class XLNetForQuestionAnswering(XLNetPreTrainedModel):
This can be used to compute head importance metrics. Default: False
This can be used to compute head importance metrics. Default: False
Inputs:
Inputs:
`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
`input_ids`: a
``
torch.LongTensor
``
of shape [batch_size, sequence_length]
with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
`run_bert_extract_features.py`, `run_bert_classifier.py` and `run_bert_squad.py`)
`run_bert_extract_features.py`, `run_bert_classifier.py` and `run_bert_squad.py`)
`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
`token_type_ids`: an optional
``
torch.LongTensor
``
of shape [batch_size, sequence_length] with the token
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
a `sentence B` token (see XLNet paper for more details).
a `sentence B` token (see XLNet paper for more details).
`attention_mask`: [optional] float32 Tensor, SAME FUNCTION as `input_mask`
`attention_mask`: [optional] float32 Tensor, SAME FUNCTION as `input_mask`
but with 1 for real tokens and 0 for padding.
but with 1 for real tokens and 0 for padding.
Added for easy compatibility with the BERT model (which uses this negative masking).
Added for easy compatibility with the BERT model (which uses this negative masking).
You can only uses one among `input_mask` and `attention_mask`
You can only uses one among `input_mask` and `attention_mask`
`input_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
`input_mask`: an optional
``
torch.LongTensor
``
of shape [batch_size, sequence_length] with indices
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
input sequence length in the current batch. It's the mask that we typically use for attention when
input sequence length in the current batch. It's the mask that we typically use for attention when
a batch has varying length sentences.
a batch has varying length sentences.
`start_positions`: position of the first token for the labeled span: torch.LongTensor of shape [batch_size].
`start_positions`: position of the first token for the labeled span:
``
torch.LongTensor
``
of shape [batch_size].
Positions are clamped to the length of the sequence and position outside of the sequence are not taken
Positions are clamped to the length of the sequence and position outside of the sequence are not taken
into account for computing the loss.
into account for computing the loss.
`end_positions`: position of the last token for the labeled span: torch.LongTensor of shape [batch_size].
`end_positions`: position of the last token for the labeled span:
``
torch.LongTensor
``
of shape [batch_size].
Positions are clamped to the length of the sequence and position outside of the sequence are not taken
Positions are clamped to the length of the sequence and position outside of the sequence are not taken
into account for computing the loss.
into account for computing the loss.
`head_mask`: an optional torch.Tensor of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
`head_mask`: an optional
``
torch.Tensor
``
of shape [num_heads] or [num_layers, num_heads] with indices between 0 and 1.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
It's a mask to be used to nullify some heads of the transformer. 1.0 => head is fully masked, 0.0 => head is not masked.
Outputs:
Outputs:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment