Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
d0383e4d
Commit
d0383e4d
authored
Dec 06, 2019
by
patrickvonplaten
Browse files
corrected documentation for past tensor shape for ctrl and gpt2 model
parent
35ff345f
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
10 additions
and
10 deletions
+10
-10
transformers/modeling_ctrl.py
transformers/modeling_ctrl.py
+2
-2
transformers/modeling_gpt2.py
transformers/modeling_gpt2.py
+3
-3
transformers/modeling_tf_ctrl.py
transformers/modeling_tf_ctrl.py
+2
-2
transformers/modeling_tf_gpt2.py
transformers/modeling_tf_gpt2.py
+3
-3
No files found.
transformers/modeling_ctrl.py
View file @
d0383e4d
...
@@ -252,7 +252,7 @@ class CTRLModel(CTRLPreTrainedModel):
...
@@ -252,7 +252,7 @@ class CTRLModel(CTRLPreTrainedModel):
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
Sequence of hidden-states at the last layer of the model.
Sequence of hidden-states at the last layer of the model.
**past**:
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
should not be passed as input ids as they have already been computed.
...
@@ -438,7 +438,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
...
@@ -438,7 +438,7 @@ class CTRLLMHeadModel(CTRLPreTrainedModel):
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
**past**:
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
should not be passed as input ids as they have already been computed.
...
...
transformers/modeling_gpt2.py
View file @
d0383e4d
...
@@ -329,7 +329,7 @@ class GPT2Model(GPT2PreTrainedModel):
...
@@ -329,7 +329,7 @@ class GPT2Model(GPT2PreTrainedModel):
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
Sequence of hidden-states at the last layer of the model.
Sequence of hidden-states at the last layer of the model.
**past**:
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
should not be passed as input ids as they have already been computed.
...
@@ -503,7 +503,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
...
@@ -503,7 +503,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
**past**:
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
should not be passed as input ids as they have already been computed.
...
@@ -596,7 +596,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
...
@@ -596,7 +596,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
**mc_prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, num_choices)``
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
**past**:
**past**:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``torch.FloatTensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed.
should not be passed as input ids as they have already been computed.
...
...
transformers/modeling_tf_ctrl.py
View file @
d0383e4d
...
@@ -400,7 +400,7 @@ class TFCTRLModel(TFCTRLPreTrainedModel):
...
@@ -400,7 +400,7 @@ class TFCTRLModel(TFCTRLPreTrainedModel):
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
Sequence of hidden-states at the last layer of the model.
Sequence of hidden-states at the last layer of the model.
**past**:
**past**:
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``tf.Tensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding.
Can be used (see `past` input) to speed up sequential decoding.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
...
@@ -462,7 +462,7 @@ class TFCTRLLMHeadModel(TFCTRLPreTrainedModel):
...
@@ -462,7 +462,7 @@ class TFCTRLLMHeadModel(TFCTRLPreTrainedModel):
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
**prediction_scores**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
**past**:
**past**:
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``tf.Tensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding.
Can be used (see `past` input) to speed up sequential decoding.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
...
...
transformers/modeling_tf_gpt2.py
View file @
d0383e4d
...
@@ -436,7 +436,7 @@ class TFGPT2Model(TFGPT2PreTrainedModel):
...
@@ -436,7 +436,7 @@ class TFGPT2Model(TFGPT2PreTrainedModel):
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
**last_hidden_state**: ``tf.Tensor`` of shape ``(batch_size, sequence_length, hidden_size)``
Sequence of hidden-states at the last layer of the model.
Sequence of hidden-states at the last layer of the model.
**past**:
**past**:
list of ``tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of ``tf.Tensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding.
Can be used (see `past` input) to speed up sequential decoding.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
...
@@ -476,7 +476,7 @@ class TFGPT2LMHeadModel(TFGPT2PreTrainedModel):
...
@@ -476,7 +476,7 @@ class TFGPT2LMHeadModel(TFGPT2PreTrainedModel):
**prediction_scores**: `tf.Tensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
**prediction_scores**: `tf.Tensor`` of shape ``(batch_size, sequence_length, config.vocab_size)``
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
**past**:
**past**:
list of `tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of `tf.Tensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding.
Can be used (see `past` input) to speed up sequential decoding.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
...
@@ -535,7 +535,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
...
@@ -535,7 +535,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
**mc_prediction_scores**: `tf.Tensor`` of shape ``(batch_size, num_choices)``
**mc_prediction_scores**: `tf.Tensor`` of shape ``(batch_size, num_choices)``
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
Prediction scores of the multiplechoice classification head (scores for each choice before SoftMax).
**past**:
**past**:
list of `tf.Tensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length,
sequence_length
)``:
list of `tf.Tensor`` (one for each layer) of shape ``(
2,
batch_size, num_heads, sequence_length,
embed_size_per_head
)``:
that contains pre-computed hidden-states (key and values in the attention blocks).
that contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding.
Can be used (see `past` input) to speed up sequential decoding.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment