"vscode:/vscode.git/clone" did not exist on "a59abedfb5301ede6923ef0312de2ae5fa34fc97"
Unverified Commit 003c4771 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[GPT2, CTRL] Allow input of input_ids and past of variable length (#4581)

* revert convenience  method

* clean docs a bit
parent 5ddd8d65
...@@ -208,9 +208,11 @@ CTRL_START_DOCSTRING = r""" ...@@ -208,9 +208,11 @@ CTRL_START_DOCSTRING = r"""
CTRL_INPUTS_DOCSTRING = r""" CTRL_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, input_ids_length)`):
:obj:`input_ids_length` = ``sequence_length`` if ``past`` is ``None`` else ``past[0].shape[-2]`` (``sequence_length`` of input past key value states).
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
If `past` is used, optionally only the last `input_ids` have to be input (see `past`).
If `past` is used, only input_ids that do not have their past calculated should be passed as input_ids.
Indices can be obtained using :class:`transformers.CTRLTokenizer`. Indices can be obtained using :class:`transformers.CTRLTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
...@@ -220,9 +222,7 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -220,9 +222,7 @@ CTRL_INPUTS_DOCSTRING = r"""
past (:obj:`List[torch.FloatTensor]` of length :obj:`config.n_layers`): past (:obj:`List[torch.FloatTensor]` of length :obj:`config.n_layers`):
Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
(see `past` output below). Can be used to speed up sequential decoding. (see `past` output below). Can be used to speed up sequential decoding.
If `past` is used, the user can optionally input only the last `input_ids` The input_ids which have their past given to this model should not be passed as input ids as they have already been computed.
(those that don't have their past given to this model) of shape :obj:`(batch_size, 1)`
instead of all `input_ids` of shape :obj:`(batch_size, sequence_length)`.
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -233,7 +233,6 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -233,7 +233,6 @@ CTRL_INPUTS_DOCSTRING = r"""
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
corresponds to a `sentence B` token corresponds to a `sentence B` token
If `past` is used, optionally only the last `token_type_ids` have to be input (see `past`).
`What are token type IDs? <../glossary.html#token-type-ids>`_ `What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
...@@ -246,7 +245,6 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -246,7 +245,6 @@ CTRL_INPUTS_DOCSTRING = r"""
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**. :obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
input_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`): input_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
If `past` is used, optionally only the last `input_embeds` have to be input (see `past`). If `past` is used, optionally only the last `input_embeds` have to be input (see `past`).
...@@ -344,16 +342,6 @@ class CTRLModel(CTRLPreTrainedModel): ...@@ -344,16 +342,6 @@ class CTRLModel(CTRLPreTrainedModel):
""" """
# If using past key value states, only the last tokens
# should be given as an input
if past is not None:
if input_ids is not None:
input_ids = input_ids[:, -1:]
if inputs_embeds is not None:
inputs_embeds = inputs_embeds[:, -1:]
if token_type_ids is not None:
token_type_ids = token_type_ids[:, -1:]
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None: elif input_ids is not None:
......
...@@ -286,9 +286,11 @@ GPT2_START_DOCSTRING = r""" ...@@ -286,9 +286,11 @@ GPT2_START_DOCSTRING = r"""
GPT2_INPUTS_DOCSTRING = r""" GPT2_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, input_ids_length)`):
:obj:`input_ids_length` = ``sequence_length`` if ``past`` is ``None`` else ``past[0].shape[-2]`` (``sequence_length`` of input past key value states).
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
If `past` is used, optionally only the last `input_ids` have to be input (see `past`).
If `past` is used, only `input_ids` that do not have their past calculated should be passed as `input_ids`.
Indices can be obtained using :class:`transformers.GPT2Tokenizer`. Indices can be obtained using :class:`transformers.GPT2Tokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
...@@ -299,7 +301,7 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -299,7 +301,7 @@ GPT2_INPUTS_DOCSTRING = r"""
past (:obj:`List[torch.FloatTensor]` of length :obj:`config.n_layers`): past (:obj:`List[torch.FloatTensor]` of length :obj:`config.n_layers`):
Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
(see `past` output below). Can be used to speed up sequential decoding. (see `past` output below). Can be used to speed up sequential decoding.
If `past` is used, the user can optionally input only the last `input_ids` (those that don't have their past given to this model) of shape :obj:`(batch_size, 1)` instead of all `input_ids` of shape :obj:`(batch_size, sequence_length)`. The `input_ids` which have their past given to this model should not be passed as `input_ids` as they have already been computed.
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -311,8 +313,6 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -311,8 +313,6 @@ GPT2_INPUTS_DOCSTRING = r"""
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
corresponds to a `sentence B` token corresponds to a `sentence B` token
If `past` is used, optionally only the last `token_type_ids` have to be input (see `past`).
`What are token type IDs? <../glossary.html#token-type-ids>`_ `What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Indices of positions of each input sequence tokens in the position embeddings. Indices of positions of each input sequence tokens in the position embeddings.
...@@ -324,7 +324,6 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -324,7 +324,6 @@ GPT2_INPUTS_DOCSTRING = r"""
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**. :obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
input_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`): input_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`, defaults to :obj:`None`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
If `past` is used, optionally only the last `input_embeds` have to be input (see `past`). If `past` is used, optionally only the last `input_embeds` have to be input (see `past`).
...@@ -410,16 +409,6 @@ class GPT2Model(GPT2PreTrainedModel): ...@@ -410,16 +409,6 @@ class GPT2Model(GPT2PreTrainedModel):
""" """
# If using past key value states, only the last tokens
# should be given as an input
if past is not None:
if input_ids is not None:
input_ids = input_ids[:, -1:]
if inputs_embeds is not None:
inputs_embeds = inputs_embeds[:, -1:]
if token_type_ids is not None:
token_type_ids = token_type_ids[:, -1:]
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None: elif input_ids is not None:
......
...@@ -411,9 +411,12 @@ CTRL_START_DOCSTRING = r""" ...@@ -411,9 +411,12 @@ CTRL_START_DOCSTRING = r"""
CTRL_INPUTS_DOCSTRING = r""" CTRL_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, input_ids_length)`):
:obj:`input_ids_length` = ``sequence_length`` if ``past`` is ``None`` else ``past[0].shape[-2]`` (``sequence_length`` of input past key value states).
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
If `past` is used, optionally only the last `input_ids` have to be input (see `past`).
If `past` is used, only input_ids that do not have their past calculated should be passed as input_ids (see `past`).
Indices can be obtained using :class:`transformers.CTRLTokenizer`. Indices can be obtained using :class:`transformers.CTRLTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
...@@ -423,9 +426,8 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -423,9 +426,8 @@ CTRL_INPUTS_DOCSTRING = r"""
past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers`): past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers`):
Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
(see `past` output below). Can be used to speed up sequential decoding. (see `past` output below). Can be used to speed up sequential decoding.
If `past` is used, the user can optionally input only the last `input_ids` The token ids which have their past given to this model
(those that don't have their past given to this model) of shape :obj:`(batch_size, 1)` should not be passed as input ids as they have already been computed.
instead of all `input_ids` of shape :obj:`(batch_size, sequence_length)`.
attention_mask (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -436,7 +438,6 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -436,7 +438,6 @@ CTRL_INPUTS_DOCSTRING = r"""
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
corresponds to a `sentence B` token corresponds to a `sentence B` token
If `past` is used, optionally only the last `token_type_ids` have to be input (see `past`).
`What are token type IDs? <../glossary.html#token-type-ids>`_ `What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): position_ids (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
...@@ -452,7 +453,6 @@ CTRL_INPUTS_DOCSTRING = r""" ...@@ -452,7 +453,6 @@ CTRL_INPUTS_DOCSTRING = r"""
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
If `past` is used, optionally only the last `input_embeds` have to be input (see `past`).
use_cache (:obj:`bool`): use_cache (:obj:`bool`):
If `use_cache` is True, `past` key value states are returned and If `use_cache` is True, `past` key value states are returned and
can be used to speed up decoding (see `past`). Defaults to `True`. can be used to speed up decoding (see `past`). Defaults to `True`.
......
...@@ -284,16 +284,6 @@ class TFGPT2MainLayer(tf.keras.layers.Layer): ...@@ -284,16 +284,6 @@ class TFGPT2MainLayer(tf.keras.layers.Layer):
else: else:
input_ids = inputs input_ids = inputs
# If using past key value states, only the last tokens
# should be given as an input
if past is not None:
if input_ids is not None:
input_ids = input_ids[:, -1:]
if inputs_embeds is not None:
inputs_embeds = inputs_embeds[:, -1:]
if token_type_ids is not None:
token_type_ids = token_type_ids[:, -1:]
if input_ids is not None and inputs_embeds is not None: if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time") raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None: elif input_ids is not None:
...@@ -431,9 +421,11 @@ GPT2_START_DOCSTRING = r""" ...@@ -431,9 +421,11 @@ GPT2_START_DOCSTRING = r"""
GPT2_INPUTS_DOCSTRING = r""" GPT2_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, input_ids_length)`):
:obj:`input_ids_length` = ``sequence_length`` if ``past`` is ``None`` else ``past[0].shape[-2]`` (``sequence_length`` of input past key value states).
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
If `past` is used, optionally only the last `input_ids` have to be input (see `past`).
If `past` is used, only `input_ids` that do not have their past calculated should be passed as `input_ids`.
Indices can be obtained using :class:`transformers.GPT2Tokenizer`. Indices can be obtained using :class:`transformers.GPT2Tokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
...@@ -442,8 +434,9 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -442,8 +434,9 @@ GPT2_INPUTS_DOCSTRING = r"""
`What are input IDs? <../glossary.html#input-ids>`__ `What are input IDs? <../glossary.html#input-ids>`__
past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers`): past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers`):
Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model
(see `past` output below). Can be used to speed up sequential decoding. The token ids which have their past given to this model (see `past` output below). Can be used to speed up sequential decoding.
should not be passed as input ids as they have already been computed. The token ids which have their past given to this model
should not be passed as `input_ids` as they have already been computed.
attention_mask (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -454,7 +447,6 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -454,7 +447,6 @@ GPT2_INPUTS_DOCSTRING = r"""
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1``
corresponds to a `sentence B` token corresponds to a `sentence B` token
If `past` is used, optionally only the last `token_type_ids` have to be input (see `past`).
`What are token type IDs? <../glossary.html#token-type-ids>`_ `What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): position_ids (:obj:`tf.Tensor` or :obj:`Numpy array` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
...@@ -470,7 +462,6 @@ GPT2_INPUTS_DOCSTRING = r""" ...@@ -470,7 +462,6 @@ GPT2_INPUTS_DOCSTRING = r"""
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert `input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
If `past` is used, optionally only the last `input_embeds` have to be input (see `past`).
training (:obj:`boolean`, `optional`, defaults to :obj:`False`): training (:obj:`boolean`, `optional`, defaults to :obj:`False`):
Whether to activate dropout modules (if set to :obj:`True`) during training or to de-activate them Whether to activate dropout modules (if set to :obj:`True`) during training or to de-activate them
(if set to :obj:`False`) for evaluation. (if set to :obj:`False`) for evaluation.
...@@ -639,7 +630,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel): ...@@ -639,7 +630,7 @@ class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers` with each tensor of shape :obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`): past (:obj:`List[tf.Tensor]` of length :obj:`config.n_layers` with each tensor of shape :obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`):
Contains pre-computed hidden-states (key and values in the attention blocks). Contains pre-computed hidden-states (key and values in the attention blocks).
Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model Can be used (see `past` input) to speed up sequential decoding. The token ids which have their past given to this model
should not be passed as input ids as they have already been computed. should not be passed as `input_ids` as they have already been computed.
hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``config.output_hidden_states=True``): hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`. of shape :obj:`(batch_size, sequence_length, hidden_size)`.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment