"git@developer.sourcefind.cn:gaoqiong/composable_kernel.git" did not exist on "2ee1c0a70a305b00770c51a6ad585895992b327c"
Unverified Commit 2152bfea authored by Thomas Wolf's avatar Thomas Wolf Committed by GitHub
Browse files

Merge pull request #316 from joelgrus/gpt2docs

update documentation for gpt-2
parents a5b3a895 8722e9eb
...@@ -773,7 +773,7 @@ This model *outputs*: ...@@ -773,7 +773,7 @@ This model *outputs*:
*Outputs*: *Outputs*:
- if `lm_labels` is not `None`: - if `lm_labels` is not `None`:
Outputs the language modeling loss. Outputs the language modeling loss.
- else: a tupple of - else: a tuple of
- `lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, sequence_length, total_tokens_embeddings] (or more generally [d_1, ..., d_n, total_tokens_embeddings] were d_1 ... d_n are the dimension of input_ids) - `lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, sequence_length, total_tokens_embeddings] (or more generally [d_1, ..., d_n, total_tokens_embeddings] were d_1 ... d_n are the dimension of input_ids)
- `presents`: a list of pre-computed hidden-states (key and values in each attention blocks) as a torch.FloatTensors. They can be reused to speed up sequential decoding (see the `run_gpt2.py` example). - `presents`: a list of pre-computed hidden-states (key and values in each attention blocks) as a torch.FloatTensors. They can be reused to speed up sequential decoding (see the `run_gpt2.py` example).
......
...@@ -368,18 +368,17 @@ class GPT2PreTrainedModel(nn.Module): ...@@ -368,18 +368,17 @@ class GPT2PreTrainedModel(nn.Module):
Params: Params:
pretrained_model_name_or_path: either: pretrained_model_name_or_path: either:
- a str with the name of a pre-trained model to load selected in the list of: - a str with the name of a pre-trained model to load selected in the list of:
. `openai-gpt` . `gpt2`
- a path or url to a pretrained model archive containing: - a path or url to a pretrained model archive containing:
. `gpt2_config.json` a configuration file for the model . `gpt2_config.json` a configuration file for the model
. `pytorch_model.bin` a PyTorch dump of a GPT2Model instance . `pytorch_model.bin` a PyTorch dump of a GPT2Model instance
- a path or url to a pretrained model archive containing: - a path or url to a pretrained model archive containing:
. `bert_config.json` a configuration file for the model . `gpt2_config.json` a configuration file for the model
. a TensorFlow checkpoint with trained weights . a TensorFlow checkpoint with trained weights
from_tf: should we load the weights from a locally saved TensorFlow checkpoint from_tf: should we load the weights from a locally saved TensorFlow checkpoint
cache_dir: an optional path to a folder in which the pre-trained models will be cached. cache_dir: an optional path to a folder in which the pre-trained models will be cached.
state_dict: an optional state dictionnary (collections.OrderedDict object) to use instead of pre-trained models state_dict: an optional state dictionary (collections.OrderedDict object) to use instead of pre-trained models
*inputs, **kwargs: additional input for the specific Bert class *inputs, **kwargs: additional input for the specific GPT class
(ex: num_labels for BertForSequenceClassification)
""" """
if pretrained_model_name_or_path in PRETRAINED_MODEL_ARCHIVE_MAP: if pretrained_model_name_or_path in PRETRAINED_MODEL_ARCHIVE_MAP:
archive_file = PRETRAINED_MODEL_ARCHIVE_MAP[pretrained_model_name_or_path] archive_file = PRETRAINED_MODEL_ARCHIVE_MAP[pretrained_model_name_or_path]
...@@ -493,11 +492,16 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -493,11 +492,16 @@ class GPT2Model(GPT2PreTrainedModel):
(the previous two being the word and position embeddings). (the previous two being the word and position embeddings).
The input, position and token_type embeddings are summed inside the Transformer before the first The input, position and token_type embeddings are summed inside the Transformer before the first
self-attention block. self-attention block.
`past`: an optional list of torch.LongTensor that contains pre-computed hidden-states
(key and values in the attention blocks) to speed up sequential decoding
(this is the presents output of the model, cf. below).
Outputs: Outputs a tuple consisting of:
`hidden_states`: the encoded-hidden-states at the top of the model `hidden_states`: the encoded-hidden-states at the top of the model
as a torch.FloatTensor of size [batch_size, sequence_length, hidden_size] as a torch.FloatTensor of size [batch_size, sequence_length, hidden_size]
(or more generally [d_1, ..., d_n, hidden_size] were d_1 ... d_n are the dimension of input_ids) (or more generally [d_1, ..., d_n, hidden_size] were d_1 ... d_n are the dimension of input_ids)
`presents`: a list of pre-computed hidden-states (key and values in each attention blocks) as
torch.FloatTensors. They can be reused to speed up sequential decoding.
Example usage: Example usage:
```python ```python
...@@ -507,7 +511,7 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -507,7 +511,7 @@ class GPT2Model(GPT2PreTrainedModel):
config = modeling_gpt2.GPT2Config() config = modeling_gpt2.GPT2Config()
model = modeling_gpt2.GPT2Model(config) model = modeling_gpt2.GPT2Model(config)
hidden_states = model(input_ids) hidden_states, presents = model(input_ids)
``` ```
""" """
...@@ -571,13 +575,18 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -571,13 +575,18 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
`lm_labels`: optional language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] `lm_labels`: optional language modeling labels: torch.LongTensor of shape [batch_size, sequence_length]
with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss
is only computed for the labels set in [0, ..., vocab_size] is only computed for the labels set in [0, ..., vocab_size]
`past`: an optional list of torch.LongTensor that contains pre-computed hidden-states
(key and values in the attention blocks) to speed up sequential decoding
(this is the presents output of the model, cf. below).
Outputs: Outputs:
if `lm_labels` is not `None`: if `lm_labels` is not `None`:
Outputs the language modeling loss. Outputs the language modeling loss.
else: else a tuple:
`lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, sequence_length, config.vocab_size] `lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, sequence_length, config.vocab_size]
(or more generally [d_1, ..., d_n, config.vocab_size] were d_1 ... d_n are the dimension of input_ids) (or more generally [d_1, ..., d_n, config.vocab_size] were d_1 ... d_n are the dimension of input_ids)
`presents`: a list of pre-computed hidden-states (key and values in each attention blocks) as
torch.FloatTensors. They can be reused to speed up sequential decoding.
Example usage: Example usage:
```python ```python
...@@ -587,7 +596,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel): ...@@ -587,7 +596,7 @@ class GPT2LMHeadModel(GPT2PreTrainedModel):
config = modeling_gpt2.GPT2Config() config = modeling_gpt2.GPT2Config()
model = modeling_gpt2.GPT2LMHeadModel(config) model = modeling_gpt2.GPT2LMHeadModel(config)
lm_logits = model(input_ids) lm_logits, presents = model(input_ids)
``` ```
""" """
...@@ -635,6 +644,9 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -635,6 +644,9 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
is only computed for the labels set in [0, ..., config.vocab_size] is only computed for the labels set in [0, ..., config.vocab_size]
`multiple_choice_labels`: optional multiple choice labels: torch.LongTensor of shape [batch_size] `multiple_choice_labels`: optional multiple choice labels: torch.LongTensor of shape [batch_size]
with indices selected in [0, ..., num_choices]. with indices selected in [0, ..., num_choices].
`past`: an optional list of torch.LongTensor that contains pre-computed hidden-states
(key and values in the attention blocks) to speed up sequential decoding
(this is the presents output of the model, cf. below).
Outputs: Outputs:
if `lm_labels` and `multiple_choice_labels` are not `None`: if `lm_labels` and `multiple_choice_labels` are not `None`:
...@@ -642,6 +654,8 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -642,6 +654,8 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
else: a tuple with else: a tuple with
`lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, num_choices, sequence_length, config.vocab_size] `lm_logits`: the language modeling logits as a torch.FloatTensor of size [batch_size, num_choices, sequence_length, config.vocab_size]
`multiple_choice_logits`: the multiple choice logits as a torch.FloatTensor of size [batch_size, num_choices] `multiple_choice_logits`: the multiple choice logits as a torch.FloatTensor of size [batch_size, num_choices]
`presents`: a list of pre-computed hidden-states (key and values in each attention blocks) as
torch.FloatTensors. They can be reused to speed up sequential decoding.
Example usage: Example usage:
```python ```python
...@@ -652,7 +666,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel): ...@@ -652,7 +666,7 @@ class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
config = modeling_gpt2.GPT2Config() config = modeling_gpt2.GPT2Config()
model = modeling_gpt2.GPT2LMHeadModel(config) model = modeling_gpt2.GPT2LMHeadModel(config)
lm_logits, multiple_choice_logits = model(input_ids, mc_token_ids) lm_logits, multiple_choice_logits, presents = model(input_ids, mc_token_ids)
``` ```
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment