Unverified Commit 895ed8f4 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Generation doc (#6470)



* Generation doc

* MBartForConditionalGeneration (#6441)

* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions

* Use hash to clean the test dirs (#6475)

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix

* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)

* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sort unique_no_split_tokens to make it deterministic (#6461)

* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style

* Import accuracy_score (#6480)

* Apply suggestions from code review
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

* Generation doc

* Apply suggestions from code review
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling
Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
Co-authored-by: default avatarKevin Canwen Xu <canwenxu@126.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: default avatarQuentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: default avatargijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
parent b5ba758b
...@@ -12,7 +12,9 @@ are common among all the models to: ...@@ -12,7 +12,9 @@ are common among all the models to:
- prune the attention heads of the model. - prune the attention heads of the model.
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin` The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models). (for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
``PreTrainedModel`` ``PreTrainedModel``
...@@ -46,4 +48,8 @@ The other methods that are common to each model are defined in :class:`~transfor ...@@ -46,4 +48,8 @@ The other methods that are common to each model are defined in :class:`~transfor
Generative models Generative models
~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~
Coming soon .. autoclass:: transformers.generation_utils.GenerationMixin
:members:
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:
\ No newline at end of file
...@@ -91,7 +91,7 @@ class PretrainedConfig(object): ...@@ -91,7 +91,7 @@ class PretrainedConfig(object):
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model. keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the - **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens :obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
with probabilities that add up to ``top_p`` or highest are kept for generation. with probabilities that add up to ``top_p`` or higher are kept for generation.
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty - **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty. that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that - **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
......
...@@ -25,10 +25,15 @@ logger = logging.getLogger(__name__) ...@@ -25,10 +25,15 @@ logger = logging.getLogger(__name__)
class TFGenerationMixin: class TFGenerationMixin:
""" """
A class contraining all of the functions supporting generation, to be used as a mixin in TFPreTrainedModel. A class contraining all of the functions supporting generation, to be used as a mixin in
:class:`~transfomers.TFPreTrainedModel`.
""" """
def prepare_inputs_for_generation(self, inputs, **kwargs): def prepare_inputs_for_generation(self, inputs, **kwargs):
"""
Implement in subclasses of :class:`~transfomers.TFPreTrainedModel` for custom behavior to prepare inputs in the
generate method.
"""
return {"inputs": inputs} return {"inputs": inputs}
def _use_cache(self, outputs, use_cache): def _use_cache(self, outputs, use_cache):
...@@ -62,87 +67,83 @@ class TFGenerationMixin: ...@@ -62,87 +67,83 @@ class TFGenerationMixin:
decoder_start_token_id=None, decoder_start_token_id=None,
use_cache=None, use_cache=None,
): ):
r""" Generates sequences for models with a LM head. The method currently supports greedy or penalized greedy decoding, sampling with top-k or nucleus sampling r"""
and beam-search. Generates sequences for models with a language modeling head. The method currently supports greedy decoding,
beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.
Adapted in part from `Facebook's XLM beam search code`_. Adapted in part from `Facebook's XLM beam search code
<https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__.
.. _`Facebook's XLM beam search code`: Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the
https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529 attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values
indicated are the default values of those config.
Most of these parameters are explained in more detail in `this blog post
<https://huggingface.co/blog/how-to-generate>`__.
Parameters: Parameters:
input_ids: (`optional`) `tf.Tensor` of `dtype=tf.int32` of shape `(batch_size, sequence_length)` input_ids (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
The sequence used as a prompt for the generation. If `None` the method initializes The sequence used as a prompt for the generation. If :obj:`None` the method initializes
it as an empty `tf.Tensor` of shape `(1,)`. it as an empty :obj:`tf.Tensor` of shape :obj:`(1,)`.
max_length (:obj:`int`, `optional`, defaults to 20):
max_length: (`optional`) int The maximum length of the sequence to be generated.
The max length of the sequence to be generated. Between 1 and infinity. Default to 20. min_length (:obj:`int`, `optional`, defaults to 10):
The minimum length of the sequence to be generated.
min_length: (`optional`) int do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`):
The min length of the sequence to be generated. Between 0 and infinity. Default to 0. Whether or not to use sampling ; use greedy decoding otherwise.
do_sample: (`optional`) bool early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
num_beams (:obj:`int`, `optional`, defaults to 1):
early_stopping: (`optional`) bool Number of beams for beam search. 1 means no beam search.
if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. temperature (:obj:`float`, `optional`, defaults tp 1.0):
The value used to module the next token probabilities.
num_beams: (`optional`) int top_k (:obj:`int`, `optional`, defaults to 50):
Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (:obj:`float`, `optional`, defaults to 1.0):
temperature: (`optional`) float If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or
The value used to module the next token probabilities. Must be strictely positive. Default to 1.0. higher are kept for generation.
repetition_penalty (:obj:`float`, `optional`, defaults to 1.0):
top_k: (`optional`) int The parameter for repetition penalty. 1.0 means no penalty. See `this paper
The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50. <https://arxiv.org/pdf/1909.05858.pdf>`__ for more details.
pad_token_id (:obj:`int`, `optional`):
top_p: (`optional`) float The id of the `padding` token.
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1. bos_token_id (:obj:`int`, `optional`):
The id of the `beginning-of-sequence` token.
repetition_penalty: (`optional`) float eos_token_id (:obj:`int`, `optional`):
The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0. The id of the `end-of-sequence` token.
length_penalty (:obj:`float`, `optional`, defaults to 1.0):
bos_token_id: (`optional`) int Exponential penalty to the length. 1.0 means no penalty.
Beginning of sentence token if no prompt is provided. Default to specicic model bos_token_id or None if it does not exist.
Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in
pad_token_id: (`optional`) int order to encourage the model to produce longer sequences.
Pad token. Defaults to pad_token_id as defined in the models config. no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0):
If set to int > 0, all ngrams of that size can only occur once.
eos_token_id: (`optional`) int bad_words_ids(:obj:`List[int]`, `optional`):
EOS token. Defaults to eos_token_id as defined in the models config. List of token ids that are not allowed to be generated. In order to get the tokens of the words that
should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`.
length_penalty: (`optional`) float num_return_sequences(:obj:`int`, `optional`, defaults to 1):
Exponential penalty to the length. Default to 1. The number of independently computed returned sequences for each element in the batch.
attention_mask (:obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size, sequence_length)`, `optional`):
no_repeat_ngram_size: (`optional`) int Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for
If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once. tokens that are not masked, and 0 for masked tokens.
bad_words_ids: (`optional`) list of lists of int If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token.
`bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.
num_return_sequences: (`optional`) int
The number of independently computed returned sequences for each element in the batch. Default to 1.
attention_mask (`optional`) obj: `tf.Tensor` with `dtype=tf.int32` of same shape as `input_ids`
Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
Defaults to `None`.
`What are attention masks? <../glossary.html#attention-mask>`__ `What are attention masks? <../glossary.html#attention-mask>`__
decoder_start_token_id (:obj:`int`, `optional`):
decoder_start_token_id=None: (`optional`) int If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
If an encoder-decoder model starts decoding with a different token than BOS. use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Defaults to `None` and is changed to `BOS` later. Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding.
use_cache: (`optional`) bool model_specific_kwargs:
If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`. Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model.
Return: Return:
output: `tf.Tensor` of `dtype=tf.int32` shape `(batch_size * num_return_sequences, sequence_length)` :obj:`tf.Tensor` of :obj:`dtype=tf.int32` and shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id` The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
shorter if all batches finished early due to the :obj:`eos_token_id`.
Examples:: Examples::
......
...@@ -27,13 +27,22 @@ logger = logging.getLogger(__name__) ...@@ -27,13 +27,22 @@ logger = logging.getLogger(__name__)
class GenerationMixin: class GenerationMixin:
""" """
A class contraining all of the functions supporting generation, to be used as a mixin in PreTrainedModel. A class contraining all of the functions supporting generation, to be used as a mixin in
:class:`~transfomers.PreTrainedModel`.
""" """
def prepare_inputs_for_generation(self, input_ids, **kwargs): def prepare_inputs_for_generation(self, input_ids, **kwargs):
"""
Implement in subclasses of :class:`~transfomers.PreTrainedModel` for custom behavior to prepare inputs in the
generate method.
"""
return {"input_ids": input_ids} return {"input_ids": input_ids}
def adjust_logits_during_generation(self, logits, **kwargs): def adjust_logits_during_generation(self, logits, **kwargs):
"""
Implement in subclasses of :class:`~transfomers.PreTrainedModel` for custom behavior to adjust the logits in
the generate method.
"""
return logits return logits
def _use_cache(self, outputs, use_cache): def _use_cache(self, outputs, use_cache):
...@@ -45,7 +54,9 @@ class GenerationMixin: ...@@ -45,7 +54,9 @@ class GenerationMixin:
return True return True
def enforce_repetition_penalty_(self, lprobs, batch_size, num_beams, prev_output_tokens, repetition_penalty): def enforce_repetition_penalty_(self, lprobs, batch_size, num_beams, prev_output_tokens, repetition_penalty):
"""repetition penalty (from CTRL paper https://arxiv.org/abs/1909.05858). """ """
Enforce the repetition penalty (from the `CTRL paper <https://arxiv.org/abs/1909.05858>`__).
"""
for i in range(batch_size * num_beams): for i in range(batch_size * num_beams):
for previous_token in set(prev_output_tokens[i].tolist()): for previous_token in set(prev_output_tokens[i].tolist()):
# if score < 0 then repetition penalty has to multiplied to reduce the previous token probability # if score < 0 then repetition penalty has to multiplied to reduce the previous token probability
...@@ -123,89 +134,83 @@ class GenerationMixin: ...@@ -123,89 +134,83 @@ class GenerationMixin:
use_cache: Optional[bool] = None, use_cache: Optional[bool] = None,
**model_specific_kwargs **model_specific_kwargs
) -> torch.LongTensor: ) -> torch.LongTensor:
r""" Generates sequences for models with a LM head. The method currently supports greedy decoding, beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling. r"""
Generates sequences for models with a language modeling head. The method currently supports greedy decoding,
beam-search decoding, sampling with temperature, sampling with top-k or nucleus sampling.
Adapted in part from `Facebook's XLM beam search code`_. Adapted in part from `Facebook's XLM beam search code
<https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529>`__.
.. _`Facebook's XLM beam search code`: Apart from :obj:`input_ids` and :obj:`attention_mask`, all the arguments below will default to the value of the
https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529 attribute of the same name inside the :class:`~transformers.PretrainedConfig` of the model. The default values
indicated are the default values of those config.
Most of these parameters are explained in more detail in `this blog post
<https://huggingface.co/blog/how-to-generate>`__.
Parameters: Parameters:
input_ids: (`optional`) `torch.LongTensor` of shape `(batch_size, sequence_length)` input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
The sequence used as a prompt for the generation. If `None` the method initializes The sequence used as a prompt for the generation. If :obj:`None` the method initializes
it as an empty `torch.LongTensor` of shape `(1,)`. it as an empty :obj:`torch.LongTensor` of shape :obj:`(1,)`.
max_length (:obj:`int`, `optional`, defaults to 20):
max_length: (`optional`) int The maximum length of the sequence to be generated.
The max length of the sequence to be generated. Between `min_length` and infinity. Default to 20. min_length (:obj:`int`, `optional`, defaults to 10):
The minimum length of the sequence to be generated.
min_length: (`optional`) int do_sample (:obj:`bool`, `optional`, defaults to :obj:`False`):
The min length of the sequence to be generated. Between 0 and infinity. Default to 0. Whether or not to use sampling ; use greedy decoding otherwise.
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
do_sample: (`optional`) bool Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
If set to `False` greedy decoding is used. Otherwise sampling is used. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. num_beams (:obj:`int`, `optional`, defaults to 1):
Number of beams for beam search. 1 means no beam search.
early_stopping: (`optional`) bool temperature (:obj:`float`, `optional`, defaults tp 1.0):
if set to `True` beam search is stopped when at least `num_beams` sentences finished per batch. Defaults to `False` as defined in `configuration_utils.PretrainedConfig`. The value used to module the next token probabilities.
top_k (:obj:`int`, `optional`, defaults to 50):
num_beams: (`optional`) int The number of highest probability vocabulary tokens to keep for top-k-filtering.
Number of beams for beam search. Must be between 1 and infinity. 1 means no beam search. Default to 1. top_p (:obj:`float`, `optional`, defaults to 1.0):
If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or
temperature: (`optional`) float higher are kept for generation.
The value used to module the next token probabilities. Must be strictly positive. Default to 1.0. repetition_penalty (:obj:`float`, `optional`, defaults to 1.0):
The parameter for repetition penalty. 1.0 means no penalty. See `this paper
top_k: (`optional`) int <https://arxiv.org/pdf/1909.05858.pdf>`__ for more details.
The number of highest probability vocabulary tokens to keep for top-k-filtering. Between 1 and infinity. Default to 50. pad_token_id (:obj:`int`, `optional`):
The id of the `padding` token.
top_p: (`optional`) float bos_token_id (:obj:`int`, `optional`):
The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling. Must be between 0 and 1. Default to 1. The id of the `beginning-of-sequence` token.
eos_token_id (:obj:`int`, `optional`):
repetition_penalty: (`optional`) float The id of the `end-of-sequence` token.
The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0. length_penalty (:obj:`float`, `optional`, defaults to 1.0):
Exponential penalty to the length. 1.0 means no penalty.
pad_token_id: (`optional`) int
Padding token. Default to specicic model pad_token_id or None if it does not exist. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in
order to encourage the model to produce longer sequences.
bos_token_id: (`optional`) int no_repeat_ngram_size (:obj:`int`, `optional`, defaults to 0):
BOS token. Defaults to `bos_token_id` as defined in the models config. If set to int > 0, all ngrams of that size can only occur once.
bad_words_ids(:obj:`List[int]`, `optional`):
eos_token_id: (`optional`) int List of token ids that are not allowed to be generated. In order to get the tokens of the words that
EOS token. Defaults to `eos_token_id` as defined in the models config. should not appear in the generated text, use :obj:`tokenizer.encode(bad_word, add_prefix_space=True)`.
num_return_sequences(:obj:`int`, `optional`, defaults to 1):
length_penalty: (`optional`) float The number of independently computed returned sequences for each element in the batch.
Exponential penalty to the length. Default to 1. attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for
no_repeat_ngram_size: (`optional`) int tokens that are not masked, and 0 for masked tokens.
If set to int > 0, all ngrams of size `no_repeat_ngram_size` can only occur once.
bad_words_ids: (`optional`) list of lists of int If not provided, will default to a tensor the same shape as :obj:`input_ids` that masks the pad token.
`bad_words_ids` contains tokens that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.
num_return_sequences: (`optional`) int
The number of independently computed returned sequences for each element in the batch. Default to 1.
attention_mask (`optional`) obj: `torch.LongTensor` of same shape as `input_ids`
Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
Defaults to `None`.
`What are attention masks? <../glossary.html#attention-mask>`__ `What are attention masks? <../glossary.html#attention-mask>`__
decoder_start_token_id (:obj:`int`, `optional`):
decoder_start_token_id=None: (`optional`) int If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
If an encoder-decoder model starts decoding with a different token than BOS. use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Defaults to `None` and is changed to `BOS` later. Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding.
use_cache: (`optional`) bool model_specific_kwargs:
If `use_cache` is True, past key values are used to speed up decoding if applicable to model. Defaults to `True`. Additional model specific kwargs will be forwarded to the :obj:`forward` function of the model.
model_specific_kwargs: (`optional`) dict
Additional model specific kwargs will be forwarded to the `forward` function of the model.
Return: Return:
output: `torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)` :obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
sequence_length is either equal to max_length or shorter if all batches finished early due to the `eos_token_id` The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
shorter if all batches finished early due to the :obj:`eos_token_id`.
Examples:: Examples::
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment