Unverified Commit 0ccb6f5c authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Clean RAG docs and template docs (#7348)

* Clean RAG docs and template docs

* Fix typo

* Better doc
parent 27174bd4
...@@ -4,11 +4,14 @@ RAG ...@@ -4,11 +4,14 @@ RAG
Overview Overview
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~
Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and Seq2Seq models. Retrieval-augmented generation ("RAG") models combine the powers of pretrained dense retrieval (DPR) and
RAG models retrieve docs, pass them to a seq2seq model, then marginalize to generate outputs. sequence-to-sequence models. RAG models retrieve documents, pass them to a seq2seq model, then marginalize to generate
The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing both retrieval and generation to adapt to downstream tasks. outputs. The retriever and seq2seq modules are initialized from pretrained models, and fine-tuned jointly, allowing
both retrieval and generation to adapt to downstream tasks.
It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela. It is based on the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
The abstract from the paper is the following: The abstract from the paper is the following:
...@@ -47,7 +50,7 @@ RagTokenizer ...@@ -47,7 +50,7 @@ RagTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RagTokenizer .. autoclass:: transformers.RagTokenizer
:members: :members: prepare_seq2seq_batch
Rag specific outputs Rag specific outputs
......
...@@ -38,35 +38,39 @@ RAG_CONFIG_DOC = r""" ...@@ -38,35 +38,39 @@ RAG_CONFIG_DOC = r"""
retrieval_vector_size (:obj:`int`, `optional`, defaults to 768): retrieval_vector_size (:obj:`int`, `optional`, defaults to 768):
Dimensionality of the document embeddings indexed by :class:`~transformers.RagRetriever`. Dimensionality of the document embeddings indexed by :class:`~transformers.RagRetriever`.
retrieval_batch_size (:obj:`int`, `optional`, defaults to 8): retrieval_batch_size (:obj:`int`, `optional`, defaults to 8):
Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated :class:`~transformers.RagRetriever`. Retrieval batch size, defined as the number of queries issues concurrently to the faiss index excapsulated
:class:`~transformers.RagRetriever`.
dataset (:obj:`str`, `optional`, defaults to :obj:`"wiki_dpr"`): dataset (:obj:`str`, `optional`, defaults to :obj:`"wiki_dpr"`):
A datatset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and ids using :obj:`datasets.list_datasets()`). A dataset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and
dataset_split (:obj:`str`, `optional`, defaults to :obj:`train`) ids using :obj:`datasets.list_datasets()`).
Which split of the ``dataset`` to load. dataset_split (:obj:`str`, `optional`, defaults to :obj:`"train"`)
index_name (:obj:`str`, `optional`, defaults to :obj:`compressed`) Which split of the :obj:`dataset` to load.
The index_name of the index associated with the :obj:`dataset`. One can choose between :obj:`legacy`, :obj:`exact` and :obj:`compressed`. index_name (:obj:`str`, `optional`, defaults to :obj:`"compressed"`)
The index name of the index associated with the :obj:`dataset`. One can choose between :obj:`"legacy"`,
:obj:`"exact"` and :obj:`"compressed"`.
index_path (:obj:`str`, `optional`) index_path (:obj:`str`, `optional`)
The path to the serialized faiss index on disk. The path to the serialized faiss index on disk.
passages_path: (:obj:`str`, `optional`): passages_path: (:obj:`str`, `optional`):
A path to text passages compatible with the faiss index. Required if using :class:`~transformers.retrieval_rag.LegacyIndex` A path to text passages compatible with the faiss index. Required if using
:class:`~transformers.retrieval_rag.LegacyIndex`
use_dummy_dataset (:obj:`bool`, `optional`, defaults to ``False``) use_dummy_dataset (:obj:`bool`, `optional`, defaults to ``False``)
Whether to load a "dummy" variant of the dataset specified by :obj:`dataset`. Whether to load a "dummy" variant of the dataset specified by :obj:`dataset`.
label_smoothing (:obj:`float`, `optional`, defaults to 0.0): label_smoothing (:obj:`float`, `optional`, defaults to 0.0):
Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label smoothing in the loss calculation. Only relevant if ``return_loss`` is set to :obj:`True`. Controls the ``epsilon`` parameter value for label
If set to ``0.0``, no label smoothing is performed. smoothing in the loss calculation. If set to 0, no label smoothing is performed.
do_marginalize (:obj:`bool`, `optional`, defaults to :obj:`False`): do_marginalize (:obj:`bool`, `optional`, defaults to :obj:`False`):
If :obj:`True`, the logits are marginalized over all documents If :obj:`True`, the logits are marginalized over all documents
by making use of ``torch.nn.functional.log_softmax``. by making use of ``torch.nn.functional.log_softmax``.
reduce_loss (:obj:`bool`, `optional`, defaults to :obj:`False`): reduce_loss (:obj:`bool`, `optional`, defaults to :obj:`False`):
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation. Whether or not to reduce the NLL loss using the ``torch.Tensor.sum`` operation.
do_deduplication (:obj:`bool`, `optional`, defaults to :obj:`True`): do_deduplication (:obj:`bool`, `optional`, defaults to :obj:`True`):
Controls whether we want to deduplicate the generations from different context documents for a given input. Whether or not to deduplicate the generations from different context documents for a given input.
Has to be set to :obj:`False` if used while training with distributed backend. Has to be set to :obj:`False` if used while training with distributed backend.
exclude_bos_score (:obj:`bool`, `optional`, defaults to :obj:`False`): exclude_bos_score (:obj:`bool`, `optional`, defaults to :obj:`False`):
If :obj:`True`, the score of the BOS token is disregarded when computing Whether or not to disregard the BOS token when computing the loss.
the loss.
output_retrieved(:obj:`bool`, `optional`, defaults to :obj:`False`): output_retrieved(:obj:`bool`, `optional`, defaults to :obj:`False`):
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail. If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and
:obj:`context_attention_mask` are returned. See returned tensors for more detail.
""" """
......
...@@ -45,66 +45,63 @@ class RetrievAugLMMarginOutput(ModelOutput): ...@@ -45,66 +45,63 @@ class RetrievAugLMMarginOutput(ModelOutput):
Prediction scores of the language modeling head. Prediction scores of the language modeling head.
The score is possibly marginalized over all documents for each vocabulary token. The score is possibly marginalized over all documents for each vocabulary token.
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`): doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
Score between each retrieved document embeddigs Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
(see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`. :obj:`question_encoder_last_hidden_state`.
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``): past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`). :obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
Contains pre-computed hidden-states (key and values in the attention blocks) Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
of the decoder that can be used (see ``past_key_values`` input) to (see ``past_key_values`` input) to speed up sequential decoding.
speed up sequential decoding.
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`): retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
Embedded documents retrieved by the retriever. Embedded documents retrieved by the retriever.
Is used with ``question_encoder_last_hidden_state`` to compute Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
the ``doc_scores``.
retrieved_doc_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, config.n_docs)`, `optional`, returned when `output_retrieved=True`): retrieved_doc_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, config.n_docs)`, `optional`, returned when `output_retrieved=True`):
The indexes of the embedded documents retrieved by the retriever. The indexes of the embedded documents retrieved by the retriever.
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Input ids post-processed from the retrieved documents Input ids post-processed from the retrieved documents
and the question encoder input_ids by the retriever. and the question encoder input_ids by the retriever.
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Attention mask post-processed from the retrieved documents Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
and the question encoder input_ids by the retriever. retriever.
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`): question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Sequence of hidden-states at the output of the last layer Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
of the question encoder pooled output of the model. model.
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
+ one for the output of each layer) layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the question encoder at the output of each layer plus the initial embedding outputs. Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`): generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Sequence of hidden-states at the output of the last layer of the generator encoder of the model. Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
of shape :obj:`(batch_size, sequence_length, hidden_size)`. layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the generator encoder at the output of each layer plus the initial embedding outputs. Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
of shape :obj:`(batch_size, sequence_length, hidden_size)`. layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs. Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
""" """
loss: Optional[torch.FloatTensor] = None loss: Optional[torch.FloatTensor] = None
...@@ -133,14 +130,14 @@ class RetrievAugLMOutput(ModelOutput): ...@@ -133,14 +130,14 @@ class RetrievAugLMOutput(ModelOutput):
Prediction scores of the language modeling head. Prediction scores of the language modeling head.
The score is possibly marginalized over all documents for each vocabulary token. The score is possibly marginalized over all documents for each vocabulary token.
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`): doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`. Score between each retrieved document embeddings (see :obj:`retrieved_doc_embeds`) and
:obj:`question_encoder_last_hidden_state`.
past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``): past_key_values (:obj:`List[torch.FloatTensor]`, `optional`, returned when ``use_cache=True`` is passed or when ``config.use_cache=True``):
List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, List of :obj:`torch.FloatTensor` of length :obj:`config.n_layers`, with each tensor of shape
with each tensor of shape
:obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`). :obj:`(2, batch_size, num_heads, sequence_length, embed_size_per_head)`).
Contains pre-computed hidden-states (key and values in the attention blocks)
of the decoder that can be used (see ``past_key_values`` input) to Contains precomputed hidden-states (key and values in the attention blocks) of the decoder that can be used
speed up sequential decoding. (see ``past_key_values`` input) to speed up sequential decoding.
retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`): retrieved_doc_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs, hidden_size)`, `optional`, returned when `output_retrieved=True`):
Embedded documents retrieved by the retriever. Embedded documents retrieved by the retriever.
Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``. Is used with ``question_encoder_last_hidden_state`` to compute the ``doc_scores``.
...@@ -150,48 +147,46 @@ class RetrievAugLMOutput(ModelOutput): ...@@ -150,48 +147,46 @@ class RetrievAugLMOutput(ModelOutput):
Input ids post-processed from the retrieved documents Input ids post-processed from the retrieved documents
and the question encoder input_ids by the retriever. and the question encoder input_ids by the retriever.
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Attention mask post-processed from the retrieved Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
documents and the question encoder input_ids by the retriever. retriever.
question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`): question_encoder_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Sequence of hidden-states at the output of the last layer Sequence of hidden states at the output of the last layer of the question encoder pooled output of the
of the question encoder pooled output of the model. model.
question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): question_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
of shape :obj:`(batch_size, sequence_length, hidden_size)`. layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the question encoder at the output of each Hidden states of the question encoder at the output of each layer plus the initial embedding outputs.
layer plus the initial embedding outputs.
question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): question_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the question encoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the question encoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`): generator_enc_last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
Sequence of hidden-states at the output of the last layer of the generator encoder of the model. Sequence of hidden-states at the output of the last layer of the generator encoder of the model.
generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): generator_enc_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
of shape :obj:`(batch_size, sequence_length, hidden_size)`. layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the generator encoder at the output Hidden states of the generator encoder at the output of each layer plus the initial embedding outputs.
of each layer plus the initial embedding outputs.
generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): generator_enc_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the generator encoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): generator_dec_hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings and one for the output of each
of shape :obj:`(batch_size, sequence_length, hidden_size)`. layer) of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the generator decoder at the output of each layer plus the initial embedding outputs. Hidden states of the generator decoder at the output of each layer plus the initial embedding outputs.
generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): generator_dec_attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`. :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted average in the Attentions weights of the generator decoder, after the attention softmax, used to compute the weighted
self-attention heads. average in the self-attention heads.
""" """
logits: torch.FloatTensor = None logits: torch.FloatTensor = None
...@@ -213,10 +208,11 @@ class RetrievAugLMOutput(ModelOutput): ...@@ -213,10 +208,11 @@ class RetrievAugLMOutput(ModelOutput):
class RagPreTrainedModel(PreTrainedModel): class RagPreTrainedModel(PreTrainedModel):
r""" r"""
RAG models were released with the paper `Retrieval-Augmented Generation for RAG models were released with the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Knowledge-Intensive NLP Tasks <https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al. <https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a generator, the encoder and generator are trainable while the retriever is just an indexed dataset. RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a
generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
""" """
config_class = RagConfig config_class = RagConfig
...@@ -232,40 +228,56 @@ class RagPreTrainedModel(PreTrainedModel): ...@@ -232,40 +228,56 @@ class RagPreTrainedModel(PreTrainedModel):
*model_args, *model_args,
**kwargs **kwargs
) -> PreTrainedModel: ) -> PreTrainedModel:
r"""Instantiates an question_encoder and a generator from one or two base classes of the library from pre-trained model checkpoints. r"""
Instantiates an question encoder and a generator from one or two base classes of the library from pretrained
model checkpoints.
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). The model is set in evaluation mode by default using :obj:`model.eval()` (Dropout modules are deactivated).
To train the model, you need to first set it back in training mode with `model.train()`. To train the model, you need to first set it back in training mode with :obj:`model.train()`.
Params: Params:
question_encoder_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`): question_encoder_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
information necessary to initiate the question_encoder. Either: Information necessary to initiate the question encoder. Can be either:
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``. - A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``. ``bert-base-uncased``.
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/question_encoder``. - A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards. ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
:func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
- A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
generator_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`): generator_pretrained_model_name_or_path (:obj: `str`, `optional`, defaults to `None`):
information necessary to initiate the generator. Either: Information necessary to initiate the generator. Can be either:
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``. - A string with the `shortcut name` of a pretrained model to load from cache or download, e.g.,
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``. ``bert-base-uncased``.
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/generator``. - A string with the `identifier name` of a pretrained model that was user-uploaded to our S3, e.g.,
- a path or url to a `tensorflow index checkpoint file` (e.g. `./tf_model/model.ckpt.index`). In this case, ``from_tf`` should be set to True and a configuration object should be provided as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards. ``dbmdz/bert-base-german-cased``.
- A path to a `directory` containing model weights saved using
model_args: (`optional`) Sequence of positional arguments: :func:`~transformers.PreTrainedModel.save_pretrained`, e.g., ``./my_model_directory/``.
All remaning positional arguments will be passed to the underlying model's ``__init__`` method - A path or url to a `tensorflow index checkpoint file` (e.g, ``./tf_model/model.ckpt.index``). In
this case, ``from_tf`` should be set to :obj:`True` and a configuration object should be provided
retriever: (`optional`, ``RagRetriever``) An instance of a :class:`~transformers.RagRetriever` to use as a retriever. as ``config`` argument. This loading path is slower than converting the TensorFlow checkpoint in
a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
kwargs: (`optional`) Remaining dictionary of keyword arguments.
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g. ``output_attentions=True``). model_args (remaining positional arguments, `optional`):
- To update the question_encoder configuration, use the prefix `question_encoder_` for each configuration parameter All remaning positional arguments will be passed to the underlying model's ``__init__`` method.
- To update the generator configuration, use the prefix `generator_` for each configuration parameter retriever (:class:`~transformers.RagRetriever`, `optional`):
- To update the parent model configuration, do not use a prefix for each configuration parameter The retriever to use.
Behave differently depending on whether a :obj:`config` is provided or automatically loaded. kwwargs (remaining dictionary of keyword arguments, `optional`):
Can be used to update the configuration object (after it being loaded) and initiate the model
(e.g., ``output_attentions=True``).
- To update the question_encoder configuration, use the prefix `question_encoder_` for each
configuration parameter.
- To update the generator configuration, use the prefix `generator_` for each configuration parameter.
- To update the parent model configuration, do not use a prefix for each configuration parameter.
Behaves differently depending on whether a :obj:`config` is provided or automatically loaded.
Example:: Example::
...@@ -345,23 +357,33 @@ class RagPreTrainedModel(PreTrainedModel): ...@@ -345,23 +357,33 @@ class RagPreTrainedModel(PreTrainedModel):
RAG_START_DOCSTRING = r""" RAG_START_DOCSTRING = r"""
RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator. RAG is a seq2seq model which encapsulates two core components: a question encoder and a generator.
During a forward pass, we encode the input with the question encoder and pass it During a forward pass, we encode the input with the question encoder and pass it
to the retriever to extract relevant context documents. The documents are then prepended to the input. to the retriever to extract relevant context documents. The documents are then prepended to the input.
Such contextualized inputs is passed to the generator. Such contextualized inputs is passed to the generator.
The question encoder can be any `autoencoding` model, preferably :obj:`~transformers.DPRQuestionEncoder`, and the generator can be any `seq2seq` model, preferably :obj:`~transformers.BartForConditionalGeneration`. The question encoder can be any `autoencoding` model, preferably :class:`~transformers.DPRQuestionEncoder`, and the
generator can be any `seq2seq` model, preferably :class:`~transformers.BartForConditionalGeneration`.
The model can be initialized with a :obj:`~transformers.RagRetriever` for end-to-end generation or used in combination with the outputs of a retriever in multiple steps - see examples for more details. The model can be initialized with a :class:`~transformers.RagRetriever` for end-to-end generation or used in
The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language model head as the ``generator``. combination with the outputs of a retriever in multiple steps---see examples for more details.
The model has been tested with :class:`~transformers.DPRQuestionEncoder` as the ``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or :class:`~transformers.T5ForConditionalGeneration` as the ``generator``. The model is compatible any `autoencoding` model as the ``question_encoder`` and any `seq2seq` model with language
model head as the ``generator``. It has been tested with :class:`~transformers.DPRQuestionEncoder` as the
``question_encoder`` and :class:`~transformers.BartForConditionalGeneration` or
:class:`~transformers.T5ForConditionalGeneration` as the ``generator``.
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class. This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
pruning heads etc.)
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
usage and behavior. usage and behavior.
Args: Args:
config (:class:`~transformers.RagConfig`): Model configuration class with all the parameters of the model. config (:class:`~transformers.RagConfig`):
Model configuration class with all the parameters of the model.
Initializing with a config file does not load the weights associated with the model, only the configuration. Initializing with a config file does not load the weights associated with the model, only the configuration.
Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
question_encoder (:class:`transformers.PreTrainedModel`): question_encoder (:class:`transformers.PreTrainedModel`):
...@@ -377,44 +399,65 @@ RAG_FORWARD_INPUTS_DOCSTRING = r""" ...@@ -377,44 +399,65 @@ RAG_FORWARD_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
:class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also specifies a compatible :class:`~transformers.RagConfig`, used to initialize the model, specifies which generator to use, it also
generator tokenizer. Use that tokenizer class to obtain the indices. specifies a compatible generator tokenizer. Use that tokenizer class to obtain the indices.
attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`torch.Tensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
Mask to avoid performing attention on padding token indices in input_ids. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
- 1 for tokens that are **not masked**,
- 0 for tokens that are **maked**.
`What are attention masks? <../glossary.html#attention-mask>`__
encoder_outputs (:obj:`tuple(tuple(torch.FloatTensor)`, `optional`) encoder_outputs (:obj:`tuple(tuple(torch.FloatTensor)`, `optional`)
Tuple consists of (:obj:`last_hidden_state`, `optional`: :obj:`hidden_states`, `optional`: :obj:`attentions`) Tuple consists of (:obj:`generator_enc_last_hidden_state`, `optional`: :obj:`generator_enc_hidden_states`,
`last_hidden_state` of shape :obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of the last layer of the encoder. `optional`: :obj:`generator_enc_attentions`). :obj:`generator_enc_last_hidden_state` of shape
`doc_scores` of shape :obj:`(batch_size, n_docs)` store retrieval scores of documents retrieved for each input in the batch. :obj:`(batch_size, n_docs * sequence_length, hidden_size)` is a sequence of hidden-states at the output of
Used by the (:class:`~transformers.RagTokenForGeneration`) model during decoding. the last layer of the generator's encoder.
decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance. Used by the (:class:`~transformers.RagModel`) model during decoding.
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model you're using with your RAG instance. decoder_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
Provide for generation tasks. `None` by default, constuct as per instructions for the generator model
you're using with your RAG instance.
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`): decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`):
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default. Default behavior: generate a tensor that ignores pad tokens in :obj:`decoder_input_ids`. Causal mask will
also be used by default.
past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`): past_key_values (:obj:`tuple(tuple(torch.FloatTensor))`):
Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and :obj:`past_key_values` of the underlying generator. Tuple consists of two elements: :obj:`encoder_outputs` of the RAG model (see :obj:`encoder_outputs`) and
Can be used to speed up decoding. :obj:`past_key_values` are used in the (:class:`~transformers.RagTokenForGeneration`) :obj:`past_key_values` of the underlying generator.
model during decoding. Can be used to speed up decoding. :obj:`past_key_values` are used in the
(:class:`~transformers.RagTokenForGeneration`) model during decoding.
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`): doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`. Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information. :obj:`question_encoder_last_hidden_state`.
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the
forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and
:obj:`retrieved_doc_embeds`, see examples for more information.
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever. Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__` retriever.
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the
forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__`.
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever. Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__` retriever.
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided
to the forward pass. :obj:`context_attention_mask` are returned by
:meth:`~transformers.RagRetriever.__call__`.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
If `use_cache` is True, ``past_key_values`` are returned and can be used to speed up decoding (see If set to :obj:`True`, ``past_key_values`` key value states are returned and can be used to speed up
``past_key_values``). decoding (see ``past_key_values``).
output_attentions (:obj:`bool`, `optional`): output_attentions (:obj:`bool`, `optional`):
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`): output_hidden_states (:obj:`bool`, `optional`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
more detail.
output_retrieved(:obj:`bool`, `optional`): output_retrieved(:obj:`bool`, `optional`):
If set to ``True``, :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`, :obj:`context_input_ids` and :obj:`context_attention_mask` are returned. See returned tensors for more detail. Whether or not to return the :obj:`retrieved_doc_embeds`, :obj:`retrieved_doc_ids`,
:obj:`context_input_ids` and :obj:`context_attention_mask`. See returned tensors for more detail.
""" """
...@@ -780,28 +823,31 @@ class RagSequenceForGeneration(RagPreTrainedModel): ...@@ -780,28 +823,31 @@ class RagSequenceForGeneration(RagPreTrainedModel):
): ):
""" """
Implements RAG sequence "thorough" decoding. Implements RAG sequence "thorough" decoding.
Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other generate input parameters. Read the :meth:`~transformers.PreTrainedModel.generate`` documentation for more information on how to set other
generate input parameters.
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided. The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
:obj:`context_input_ids` has to be provided.
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever. Input IDs post-processed from the retrieved documents and the question encoder input_ids by the
retriever.
do_deduplication (:obj:`bool`, `optional`): do_deduplication (:obj:`bool`, `optional`):
Controls whether we want to deduplicate the generations from different context documents for a given input. Whether or not to deduplicate the generations from different context documents for a given input.
Has to be set to :obj:`False` if used while training with distributed backend. Has to be set to :obj:`False` if used while training with distributed backend.
num_return_sequences(:obj:`int`, `optional`, defaults to 1): num_return_sequences(:obj:`int`, `optional`, defaults to 1):
The number of independently computed returned sequences for each element in the batch. Note that this is not the value The number of independently computed returned sequences for each element in the batch. Note that this
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences`` is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate``
to `num_beams`. function, where we set ``num_return_sequences`` to :obj:`num_beams`.
num_beams (:obj:`int`, `optional`, defaults to 1): num_beams (:obj:`int`, `optional`, defaults to 1):
Number of beams for beam search. 1 means no beam search. Number of beams for beam search. 1 means no beam search.
kwargs: kwargs:
Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate``. Additional kwargs will be passed to :meth:`~transformers.PreTrainedModel.generate`.
Return:
Return:
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`: :obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or The generated sequences. The second dimension (sequence length) is either equal to :obj:`max_length` or
shorter if all batches finished early due to the :obj:`eos_token_id`. shorter if all batches finished early due to the :obj:`eos_token_id`.
""" """
...@@ -1041,6 +1087,7 @@ class RagTokenForGeneration(RagPreTrainedModel): ...@@ -1041,6 +1087,7 @@ class RagTokenForGeneration(RagPreTrainedModel):
If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation. If :obj:`True`, the NLL loss is reduced using the ``torch.Tensor.sum`` operation.
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`): kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
Legacy dictionary, which is required so that model can use `generate()` function. Legacy dictionary, which is required so that model can use `generate()` function.
Returns: Returns:
Example:: Example::
...@@ -1156,23 +1203,35 @@ class RagTokenForGeneration(RagPreTrainedModel): ...@@ -1156,23 +1203,35 @@ class RagTokenForGeneration(RagPreTrainedModel):
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`): input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then :obj:`context_input_ids` has to be provided. The sequence used as a prompt for the generation. If :obj:`input_ids` is not passed, then
:obj:`context_input_ids` has to be provided.
context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Input ids post-processed from the retrieved documents and the question encoder input_ids by the retriever. Input IDs post-processed from the retrieved documents and the question encoder :obj:`input_ids` by the
If the model has is not initialized with a ``retriever`` :obj:`context_input_ids` has to be provided to the forward pass. :obj:`context_input_ids` are returned by :meth:`~transformers.RagRetriever.__call__` retriever.
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
to the forward pass. :obj:`context_input_ids` are returned by
:meth:`~transformers.RagRetriever.__call__`.
context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`): context_attention_mask (:obj:`torch.LongTensor` of shape :obj:`(batch_size * config.n_docs, config.max_combined_length)`, `optional`, returned when `output_retrieved=True`):
Attention mask post-processed from the retrieved documents and the question encoder input_ids by the retriever. Attention mask post-processed from the retrieved documents and the question encoder :obj:`input_ids` by
If the model has is not initialized with a ``retriever`` :obj:`context_attention_mask` has to be provided to the forward pass. :obj:`context_attention_mask` are returned by :meth:`~transformers.RagRetriever.__call__` the retriever.
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
to the forward pass. :obj:`context_input_ids` are returned by
:meth:`~transformers.RagRetriever.__call__`.
doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`): doc_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.n_docs)`):
Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and :obj:`question_encoder_last_hidden_state`. Score between each retrieved document embeddigs (see :obj:`retrieved_doc_embeds`) and
If the model has is not initialized with a ``retriever`` :obj:`doc_scores` has to be provided to the forward pass. :obj:`doc_scores` can be computed via :obj:`question_encoder_last_hidden_state` and :obj:`retrieved_doc_embeds`, see examples for more information. :obj:`question_encoder_last_hidden_state`.
If the model has is not initialized with a ``retriever``, :obj:`context_input_ids` has to be provided
to the forward pass. :obj:`context_input_ids` are returned by
:meth:`~transformers.RagRetriever.__call__`.
max_length (:obj:`int`, `optional`, defaults to 20): max_length (:obj:`int`, `optional`, defaults to 20):
The maximum length of the sequence to be generated. The maximum length of the sequence to be generated.
min_length (:obj:`int`, `optional`, defaults to 10): min_length (:obj:`int`, `optional`, defaults to 10):
The minimum length of the sequence to be generated. The minimum length of the sequence to be generated.
early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`): early_stopping (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not. Whether or not to stop the beam search when at least ``num_beams`` sentences are finished per batch or not.
use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache: (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should use the past last key/values attentions (if applicable to the model) to Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding. speed up decoding.
...@@ -1195,14 +1254,13 @@ class RagTokenForGeneration(RagPreTrainedModel): ...@@ -1195,14 +1254,13 @@ class RagTokenForGeneration(RagPreTrainedModel):
num_beams (:obj:`int`, `optional`, defaults to 1): num_beams (:obj:`int`, `optional`, defaults to 1):
Number of beams for beam search. 1 means no beam search. Number of beams for beam search. 1 means no beam search.
num_return_sequences(:obj:`int`, `optional`, defaults to 1): num_return_sequences(:obj:`int`, `optional`, defaults to 1):
The number of independently computed returned sequences for each element in the batch. Note that this is not the value The number of independently computed returned sequences for each element in the batch. Note that this
we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`` function, where we set ``num_return_sequences`` is not the value we pass to the ``generator``'s `:func:`~transformers.PreTrainedModel.generate`
to `num_beams`. function, where we set ``num_return_sequences`` to :obj:`num_beams`.
decoder_start_token_id (:obj:`int`, `optional`): decoder_start_token_id (:obj:`int`, `optional`):
If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token. If an encoder-decoder model starts decoding with a different token than `bos`, the id of that token.
Return: Return:
:obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`: :obj:`torch.LongTensor` of shape :obj:`(batch_size * num_return_sequences, sequence_length)`:
The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or The generated sequences. The second dimension (sequence_length) is either equal to :obj:`max_length` or
shorter if all batches finished early due to the :obj:`eos_token_id`. shorter if all batches finished early due to the :obj:`eos_token_id`.
......
...@@ -399,12 +399,14 @@ class RagRetriever: ...@@ -399,12 +399,14 @@ class RagRetriever:
The number of docs retrieved per query. The number of docs retrieved per query.
Return: Return:
retrieved_doc_embeds (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)` :obj:`Tuple[np.ndarray, np.ndarray, List[dict]]`:
The retrieval embeddings of the retrieved docs per query. A tuple with the following objects:
doc_ids (:obj:`np.ndarray` of shape :obj:`batch_size, n_docs`)
The ids of the documents in the index - **retrieved_doc_embeds** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs, dim)`) -- The
doc_dicts (:obj:`List[dict]`): retrieval embeddings of the retrieved docs per query.
The retrieved_doc_embeds examples per query. - **doc_ids** (:obj:`np.ndarray` of shape :obj:`(batch_size, n_docs)`) -- The ids of the documents in the
index
- **doc_dicts** (:obj:`List[dict]`): The :obj:`retrieved_doc_embeds` examples per query.
""" """
doc_ids, retrieved_doc_embeds = self._main_retrieve(question_hidden_states, n_docs) doc_ids, retrieved_doc_embeds = self._main_retrieve(question_hidden_states, n_docs)
......
...@@ -17,7 +17,8 @@ import os ...@@ -17,7 +17,8 @@ import os
from typing import List, Optional from typing import List, Optional
from .configuration_rag import RagConfig from .configuration_rag import RagConfig
from .tokenization_utils_base import BatchEncoding from .file_utils import add_start_docstrings
from .tokenization_utils_base import PREPARE_SEQ2SEQ_BATCH_DOCSTRING, BatchEncoding
from .utils import logging from .utils import logging
...@@ -60,6 +61,7 @@ class RagTokenizer: ...@@ -60,6 +61,7 @@ class RagTokenizer:
def batch_decode(self, *args, **kwargs): def batch_decode(self, *args, **kwargs):
return self.generator.batch_decode(*args, **kwargs) return self.generator.batch_decode(*args, **kwargs)
@add_start_docstrings(PREPARE_SEQ2SEQ_BATCH_DOCSTRING)
def prepare_seq2seq_batch( def prepare_seq2seq_batch(
self, self,
src_texts: List[str], src_texts: List[str],
...@@ -71,66 +73,6 @@ class RagTokenizer: ...@@ -71,66 +73,6 @@ class RagTokenizer:
truncation=True, truncation=True,
**kwargs, **kwargs,
) -> BatchEncoding: ) -> BatchEncoding:
r"""
Prepare a batch that can be passed directly to an instance of :class:`~transformers.RagModel`.
Args:
src_texts: (:obj:`List[str]`):
List of documents to summarize or source language texts.
tgt_texts: (:obj:`List[str]`, `optional`):
List of summaries or target language texts.
max_length (:obj:`int`, `optional`):
Controls the maximum length for encoder inputs (documents to summarize or source language texts).
If left unset or set to :obj:`None`, this will use the predefined model maximum length if a maximum
length is required by one of the truncation/padding parameters. If the model has no specific maximum
input length (like XLNet) truncation/padding to a maximum length will be deactivated.
max_target_length (:obj:`int`, `optional`):
Controls the maximum length of decoder inputs (target language texts or summaries).
If left unset or set to :obj:`None`, this will use the max_length value.
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.PaddingStrategy`, `optional`, defaults to :obj:`False`):
Activates and controls padding. Accepts the following values:
* :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a
single sequence if provided).
* :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
* :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
return_tensors (:obj:`str` or :class:`~transformers.tokenization_utils_base.TensorType`, `optional`, defaults to "pt"):
If set, will return tensors instead of list of python integers. Acceptable values are:
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects.
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects.
* :obj:`'np'`: Return Numpy :obj:`np.ndarray` objects.
truncation (:obj:`bool`, :obj:`str` or :class:`~transformers.tokenization_utils_base.TruncationStrategy`, `optional`, defaults to :obj:`True`):
Activates and controls truncation. Accepts the following values:
* :obj:`True` or :obj:`'longest_first'`: Truncate to a maximum length specified with the argument
:obj:`max_length` or to the maximum acceptable input length for the model if that argument is not
provided. This will truncate token by token, removing a token from the longest sequence in the pair
if a pair of sequences (or a batch of pairs) is provided.
* :obj:`'only_first'`: Truncate to a maximum length specified with the argument :obj:`max_length` or to
the maximum acceptable input length for the model if that argument is not provided. This will only
truncate the first sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
* :obj:`'only_second'`: Truncate to a maximum length specified with the argument :obj:`max_length` or
to the maximum acceptable input length for the model if that argument is not provided. This will only
truncate the second sequence of a pair if a pair of sequences (or a batch of pairs) is provided.
* :obj:`False` or :obj:`'do_not_truncate'` (default): No truncation (i.e., can output batch with
sequence lengths greater than the model maximum admissible input size).
**kwargs:
Additional keyword arguments passed along to :obj:`self.__call__`.
Returns:
:class:`~transformers.BatchEncoding`: A :class:`~transformers.BatchEncoding` with the following fields:
- **input_ids** -- List of token ids to be fed to the encoder.
- **attention_mask** -- List of indices specifying which tokens should be attended to by the model.
- **labels** -- List of token ids for tgt_texts
The full set of keys ``[input_ids, attention_mask, labels]``,
will only be returned if tgt_texts is passed. Otherwise, input_ids, attention_mask will be the only keys.
"""
if max_length is None: if max_length is None:
max_length = self.question_encoder.model_max_length max_length = self.question_encoder.model_max_length
model_inputs: BatchEncoding = self.question_encoder( model_inputs: BatchEncoding = self.question_encoder(
......
...@@ -31,10 +31,10 @@ XXX_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -31,10 +31,10 @@ XXX_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class XxxConfig(PretrainedConfig): class XxxConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.XXXModel`. This is the configuration class to store the configuration of a :class:`~transformers.XxxModel` or a
It is used to instantiate a XXX model according to the specified arguments, defining the model :class:`~transformers.TFXxxModel`. It is used to instantiate a XXX model according to the specified
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture. configuration to that of the XXX `xxx-base-uncased <https://huggingface.co/xxx/xxx-base-uncased>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig` to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
...@@ -42,33 +42,35 @@ class XxxConfig(PretrainedConfig): ...@@ -42,33 +42,35 @@ class XxxConfig(PretrainedConfig):
Args: Args:
vocab_size (:obj:`int`, optional, defaults to 30522): vocab_size (:obj:`int`, `optional`, defaults to 30522):
Vocabulary size of the XXX model. Defines the different tokens that Vocabulary size of the XXX model. Defines the number of different tokens that can be represented by the
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.XXXModel`. :obj:`inputs_ids` passed when calling :class:`~transformers.XxxModel` or
hidden_size (:obj:`int`, optional, defaults to 768): :class:`~transformers.TFXxxModel`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, optional, defaults to 12): num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, optional, defaults to 12): num_attention_heads (:obj:`int`, `optional`, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to :obj:`"gelu"`): hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. The non-linear activation function (function or string) in the encoder and pooler.
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported. If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.1): hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1): attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, optional, defaults to 512): max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
The maximum sequence length that this model might ever be used with. The maximum sequence length that this model might ever be used with.
Typically set this to something large just in case (e.g., 512 or 1024 or 2048). Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, optional, defaults to 2): type_vocab_size (:obj:`int`, `optional`, defaults to 2):
The vocabulary size of the `token_type_ids` passed into :class:`~transformers.BertModel`. The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.XxxModel` or
initializer_range (:obj:`float`, optional, defaults to 0.02): :class:`~transformers.TFXxxModel`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices. The standard deviation of the :obj:`truncated_normal_initializer` for initializing all weight matrices.
layer_norm_eps (:obj:`float`, optional, defaults to 1e-5): layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-5):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
gradient_checkpointing (:obj:`bool`, optional, defaults to :obj:`False`): gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass. If :obj:`True`, use gradient checkpointing to save memory at the expense of slower backward pass.
kwargs: kwargs:
Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`. Additional arguments for common configurations, passed to :class:`~transformers.PretrainedConfig`.
......
...@@ -257,13 +257,18 @@ class TFXxxPreTrainedModel(TFPreTrainedModel): ...@@ -257,13 +257,18 @@ class TFXxxPreTrainedModel(TFPreTrainedModel):
XXX_START_DOCSTRING = r""" XXX_START_DOCSTRING = r"""
The XXX model was proposed in The XXX model was proposed in
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding `XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
<https://arxiv.org/abs/1810.04805>`__ by.... <https://arxiv.org/abs/1810.04805>`__ by....
This model is a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ sub-class. This model inherits from :class:`~transformers.TFPreTrainedModel`. Check the superclass documentation for the
Use it as a regular TF 2.0 Keras Model and generic methods the library implements for all its model (such as downloading or saving, resizing the input
refer to the TF 2.0 documentation for all matter related to general usage and behavior. embeddings, pruning heads etc.)
This model is also a `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ subclass.
Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to general
usage and behavior.
.. note:: .. note::
...@@ -272,17 +277,17 @@ XXX_START_DOCSTRING = r""" ...@@ -272,17 +277,17 @@ XXX_START_DOCSTRING = r"""
- having all inputs as keyword arguments (like PyTorch models), or - having all inputs as keyword arguments (like PyTorch models), or
- having all inputs as a list, tuple or dict in the first positional arguments. - having all inputs as a list, tuple or dict in the first positional arguments.
This second option is useful when using :obj:`tf.keras.Model.fit()` method which currently requires having This second option is useful when using :meth:`tf.keras.Model.fit` method which currently requires having
all the tensors in the first argument of the model call function: :obj:`model(inputs)`. all the tensors in the first argument of the model call function: :obj:`model(inputs)`.
If you choose this second option, there are three possibilities you can use to gather all the input Tensors If you choose this second option, there are three possibilities you can use to gather all the input Tensors
in the first positional argument : in the first positional argument :
- a single Tensor with input_ids only and nothing else: :obj:`model(inputs_ids)` - a single Tensor with :obj:`input_ids` only and nothing else: :obj:`model(inputs_ids)`
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: - a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
:obj:`model([input_ids, attention_mask])` or :obj:`model([input_ids, attention_mask, token_type_ids])` :obj:`model([input_ids, attention_mask])` or :obj:`model([input_ids, attention_mask, token_type_ids])`
- a dictionary with one or several input Tensors associated to the input names given in the docstring: - a dictionary with one or several input Tensors associated to the input names given in the docstring:
:obj:`model({'input_ids': input_ids, 'token_type_ids': token_type_ids})` :obj:`model({"input_ids": input_ids, "token_type_ids": token_type_ids})`
Parameters: Parameters:
config (:class:`~transformers.XxxConfig`): Model configuration class with all the parameters of the model. config (:class:`~transformers.XxxConfig`): Model configuration class with all the parameters of the model.
...@@ -292,27 +297,31 @@ XXX_START_DOCSTRING = r""" ...@@ -292,27 +297,31 @@ XXX_START_DOCSTRING = r"""
XXX_INPUTS_DOCSTRING = r""" XXX_INPUTS_DOCSTRING = r"""
Args: Args:
input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`): input_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`):
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.XxxTokenizer`. Indices can be obtained using :class:`~transformers.BertTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.__call__` and
:func:`transformers.PreTrainedTokenizer.__call__` for details. :func:`transformers.PreTrainedTokenizer.encode` for details.
`What are input IDs? <../glossary.html#input-ids>`__ `What are input IDs? <../glossary.html#input-ids>`__
attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`): attention_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
- 1 for tokens that are **not masked**,
- 0 for tokens that are **maked**.
`What are attention masks? <../glossary.html#attention-mask>`__ `What are attention masks? <../glossary.html#attention-mask>`__
token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`): token_type_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``:
corresponds to a `sentence B` token
- 0 corresponds to a `sentence A` token,
- 1 corresponds to a `sentence B` token.
`What are token type IDs? <../glossary.html#token-type-ids>`__ `What are token type IDs? <../glossary.html#token-type-ids>`__
position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`{0}`, `optional`): position_ids (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`({0})`, `optional`):
Indices of positions of each input sequence tokens in the position embeddings. Indices of positions of each input sequence tokens in the position embeddings.
Selected in the range ``[0, config.max_position_embeddings - 1]``. Selected in the range ``[0, config.max_position_embeddings - 1]``.
...@@ -320,21 +329,25 @@ XXX_INPUTS_DOCSTRING = r""" ...@@ -320,21 +329,25 @@ XXX_INPUTS_DOCSTRING = r"""
head_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`): head_mask (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
Mask to nullify selected heads of the self-attention modules. Mask to nullify selected heads of the self-attention modules.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
inputs_embeds (:obj:`Numpy array` or :obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, embedding_dim)`, `optional`): - 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
inputs_embeds (:obj:`tf.Tensor` of shape :obj:`({0}, hidden_size)`, `optional`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
than the model's internal embedding lookup matrix. vectors than the model's internal embedding lookup matrix.
training (:obj:`boolean`, `optional`, defaults to :obj:`False`):
Whether to activate dropout modules (if set to :obj:`True`) during training or to de-activate them
(if set to :obj:`False`) for evaluation.
output_attentions (:obj:`bool`, `optional`): output_attentions (:obj:`bool`, `optional`):
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`): output_hidden_states (:obj:`bool`, `optional`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
more detail.
return_dict (:obj:`bool`, `optional`): return_dict (:obj:`bool`, `optional`):
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
plain tuple. training (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to use the model in training mode (some modules like dropout modules have different
behaviors between training and evaluation).
""" """
...@@ -347,7 +360,7 @@ class TFXxxModel(TFXxxPreTrainedModel): ...@@ -347,7 +360,7 @@ class TFXxxModel(TFXxxPreTrainedModel):
super().__init__(config, *inputs, **kwargs) super().__init__(config, *inputs, **kwargs)
self.transformer = TFXxxMainLayer(config, name="transformer") self.transformer = TFXxxMainLayer(config, name="transformer")
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -370,7 +383,7 @@ class TFXxxForMaskedLM(TFXxxPreTrainedModel, TFMaskedLanguageModelingLoss): ...@@ -370,7 +383,7 @@ class TFXxxForMaskedLM(TFXxxPreTrainedModel, TFMaskedLanguageModelingLoss):
self.transformer = TFXxxMainLayer(config, name="transformer") self.transformer = TFXxxMainLayer(config, name="transformer")
self.mlm = TFXxxMLMHead(config, self.transformer.embeddings, name="mlm") self.mlm = TFXxxMLMHead(config, self.transformer.embeddings, name="mlm")
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -452,7 +465,7 @@ class TFXxxForSequenceClassification(TFXxxPreTrainedModel, TFSequenceClassificat ...@@ -452,7 +465,7 @@ class TFXxxForSequenceClassification(TFXxxPreTrainedModel, TFSequenceClassificat
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier" config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
) )
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -544,7 +557,7 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss): ...@@ -544,7 +557,7 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
""" """
return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS)} return {"input_ids": tf.constant(MULTIPLE_CHOICE_DUMMY_INPUTS)}
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -568,8 +581,8 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss): ...@@ -568,8 +581,8 @@ class TFXxxForMultipleChoice(TFXxxPreTrainedModel, TFMultipleChoiceLoss):
r""" r"""
labels (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`): labels (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for computing the multiple choice classification loss. Labels for computing the multiple choice classification loss.
Indices should be in ``[0, ..., num_choices]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices]`` where :obj:`num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above)s after the attention softmax, used to compute the weighted average in the self-attention of the input tensors. (See :obj:`input_ids` above)
heads. heads.
""" """
if isinstance(inputs, (tuple, list)): if isinstance(inputs, (tuple, list)):
...@@ -667,7 +680,7 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos ...@@ -667,7 +680,7 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier" config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
) )
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -734,8 +747,8 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos ...@@ -734,8 +747,8 @@ class TFXxxForTokenClassification(TFXxxPreTrainedModel, TFTokenClassificationLos
@add_start_docstrings( @add_start_docstrings(
"""XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of """XXX Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
the hidden-states output to compute `span start logits` and `span end logits`). """, layer on top of the hidden-states output to compute `span start logits` and `span end logits`). """,
XXX_START_DOCSTRING, XXX_START_DOCSTRING,
) )
class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss): class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
...@@ -748,7 +761,7 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss): ...@@ -748,7 +761,7 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="qa_outputs" config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="qa_outputs"
) )
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-cased", checkpoint="xxx-base-cased",
...@@ -773,11 +786,11 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss): ...@@ -773,11 +786,11 @@ class TFXxxForQuestionAnswering(TFXxxPreTrainedModel, TFQuestionAnsweringLoss):
r""" r"""
start_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`): start_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for position (index) of the start of the labelled span for computing the token classification loss. Labels for position (index) of the start of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (:obj:`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
end_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`): end_positions (:obj:`tf.Tensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for position (index) of the end of the labelled span for computing the token classification loss. Labels for position (index) of the end of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (:obj:`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_dict = return_dict if return_dict is not None else self.transformer.return_dict return_dict = return_dict if return_dict is not None else self.transformer.return_dict
......
...@@ -209,11 +209,16 @@ class XxxPreTrainedModel(PreTrainedModel): ...@@ -209,11 +209,16 @@ class XxxPreTrainedModel(PreTrainedModel):
module.bias.data.zero_() module.bias.data.zero_()
XXX_START_DOCSTRING = r""" The XXX model was proposed in XXX_START_DOCSTRING = r"""
`XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
The XXX model was proposed in `XXX: Pre-training of Deep Bidirectional Transformers for Language Understanding
<https://arxiv.org/abs/1810.04805>`__ by.... <https://arxiv.org/abs/1810.04805>`__ by....
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class. This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
pruning heads etc.)
This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
usage and behavior. usage and behavior.
...@@ -225,27 +230,31 @@ XXX_START_DOCSTRING = r""" The XXX model was proposed in ...@@ -225,27 +230,31 @@ XXX_START_DOCSTRING = r""" The XXX model was proposed in
XXX_INPUTS_DOCSTRING = r""" XXX_INPUTS_DOCSTRING = r"""
Inputs: Inputs:
input_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`): input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`):
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.XxxTokenizer`. Indices can be obtained using :class:`~transformers.XxxTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and See :meth:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.__call__` for details. :meth:`transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__ `What are input IDs? <../glossary.html#input-ids>`__
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`{0}`, `optional`): attention_mask (:obj:`torch.FloatTensor` of shape :obj:`({0})`, `optional`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
``1`` for tokens that are NOT MASKED, ``0`` for MASKED tokens.
- 1 for tokens that are **not masked**,
- 0 for tokens that are **maked**.
`What are attention masks? <../glossary.html#attention-mask>`__ `What are attention masks? <../glossary.html#attention-mask>`__
token_type_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`): token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
Segment token indices to indicate first and second portions of the inputs. Segment token indices to indicate first and second portions of the inputs.
Indices are selected in ``[0, 1]``: ``0`` corresponds to a `sentence A` token, ``1`` Indices are selected in ``[0, 1]``:
corresponds to a `sentence B` token
- 0 corresponds to a `sentence A` token,
- 1 corresponds to a `sentence B` token.
`What are token type IDs? <../glossary.html#token-type-ids>`_ `What are token type IDs? <../glossary.html#token-type-ids>`_
position_ids (:obj:`torch.LongTensor` of shape :obj:`{0}`, `optional`): position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
Indices of positions of each input sequence tokens in the position embeddings. Indices of positions of each input sequence tokens in the position embeddings.
Selected in the range ``[0, config.max_position_embeddings - 1]``. Selected in the range ``[0, config.max_position_embeddings - 1]``.
...@@ -253,18 +262,22 @@ XXX_INPUTS_DOCSTRING = r""" ...@@ -253,18 +262,22 @@ XXX_INPUTS_DOCSTRING = r"""
head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`): head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
Mask to nullify selected heads of the self-attention modules. Mask to nullify selected heads of the self-attention modules.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
:obj:`1` indicates the head is **not masked**, :obj:`0` indicates the head is **masked**.
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`): - 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`):
Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation. Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
This is useful if you want more control over how to convert `input_ids` indices into associated vectors This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
than the model's internal embedding lookup matrix. vectors than the model's internal embedding lookup matrix.
output_attentions (:obj:`bool`, `optional`): output_attentions (:obj:`bool`, `optional`):
If set to ``True``, the attentions tensors of all attention layers are returned. See ``attentions`` under returned tensors for more detail. Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
tensors for more detail.
output_hidden_states (:obj:`bool`, `optional`): output_hidden_states (:obj:`bool`, `optional`):
If set to ``True``, the hidden states of all layers are returned. See ``hidden_states`` under returned tensors for more detail. Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
more detail.
return_dict (:obj:`bool`, `optional`): return_dict (:obj:`bool`, `optional`):
If set to ``True``, the model will return a :class:`~transformers.file_utils.ModelOutput` instead of a Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
plain tuple.
""" """
...@@ -296,7 +309,7 @@ class XxxModel(XxxPreTrainedModel): ...@@ -296,7 +309,7 @@ class XxxModel(XxxPreTrainedModel):
for layer, heads in heads_to_prune.items(): for layer, heads in heads_to_prune.items():
self.encoder.layer[layer].attention.prune_heads(heads) self.encoder.layer[layer].attention.prune_heads(heads)
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -378,7 +391,7 @@ class XxxForMaskedLM(XxxPreTrainedModel): ...@@ -378,7 +391,7 @@ class XxxForMaskedLM(XxxPreTrainedModel):
def get_output_embeddings(self): def get_output_embeddings(self):
return self.lm_head return self.lm_head
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -455,7 +468,7 @@ class XxxForSequenceClassification(XxxPreTrainedModel): ...@@ -455,7 +468,7 @@ class XxxForSequenceClassification(XxxPreTrainedModel):
self.init_weights() self.init_weights()
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -538,7 +551,7 @@ class XxxForMultipleChoice(XxxPreTrainedModel): ...@@ -538,7 +551,7 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
self.init_weights() self.init_weights()
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, num_choices, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -561,8 +574,8 @@ class XxxForMultipleChoice(XxxPreTrainedModel): ...@@ -561,8 +574,8 @@ class XxxForMultipleChoice(XxxPreTrainedModel):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for computing the multiple choice classification loss. Labels for computing the multiple choice classification loss.
Indices should be in ``[0, ..., num_choices-1]`` where `num_choices` is the size of the second dimension Indices should be in ``[0, ..., num_choices-1]`` where :obj:`num_choices` is the size of the second dimension
of the input tensors. (see `input_ids` above) of the input tensors. (See :obj:`input_ids` above)
""" """
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1] num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
...@@ -628,7 +641,7 @@ class XxxForTokenClassification(XxxPreTrainedModel): ...@@ -628,7 +641,7 @@ class XxxForTokenClassification(XxxPreTrainedModel):
self.init_weights() self.init_weights()
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -713,7 +726,7 @@ class XxxForQuestionAnswering(XxxPreTrainedModel): ...@@ -713,7 +726,7 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
self.init_weights() self.init_weights()
@add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(XXX_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings( @add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC, tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="xxx-base-uncased", checkpoint="xxx-base-uncased",
...@@ -737,11 +750,11 @@ class XxxForQuestionAnswering(XxxPreTrainedModel): ...@@ -737,11 +750,11 @@ class XxxForQuestionAnswering(XxxPreTrainedModel):
r""" r"""
start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for position (index) of the start of the labelled span for computing the token classification loss. Labels for position (index) of the start of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (:obj:`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`): end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for position (index) of the end of the labelled span for computing the token classification loss. Labels for position (index) of the end of the labelled span for computing the token classification loss.
Positions are clamped to the length of the sequence (`sequence_length`). Positions are clamped to the length of the sequence (:obj:`sequence_length`).
Position outside of the sequence are not taken into account for computing the loss. Position outside of the sequence are not taken into account for computing the loss.
""" """
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
......
...@@ -80,16 +80,16 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -80,16 +80,16 @@ class XxxTokenizer(PreTrainedTokenizer):
r""" r"""
Constructs a XXX tokenizer. Based on XXX. Constructs a XXX tokenizer. Based on XXX.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the methods. Users This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
should refer to the superclass for more information regarding methods. Users should refer to this superclass for more information regarding those methods.
Args: Args:
vocab_file (:obj:`str`): vocab_file (:obj:`str`):
File containing the vocabulary. File containing the vocabulary.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`): do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`): do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether to do basic tokenization before WordPiece. Whether ot not to do basic tokenization before WordPiece.
never_split (:obj:`Iterable`, `optional`): never_split (:obj:`Iterable`, `optional`):
Collection of tokens which will never be split during tokenization. Only has an effect when Collection of tokens which will never be split during tokenization. Only has an effect when
:obj:`do_basic_tokenize=True` :obj:`do_basic_tokenize=True`
...@@ -194,19 +194,19 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -194,19 +194,19 @@ class XxxTokenizer(PreTrainedTokenizer):
""" """
Build model inputs from a sequence or a pair of sequence for sequence classification tasks Build model inputs from a sequence or a pair of sequence for sequence classification tasks
by concatenating and adding special tokens. by concatenating and adding special tokens.
A BERT sequence has the following format: A XXX sequence has the following format:
- single sequence: ``[CLS] X [SEP]`` - single sequence: ``[CLS] X [SEP]``
- pair of sequences: ``[CLS] A [SEP] B [SEP]`` - pair of sequences: ``[CLS] A [SEP] B [SEP]``
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (:obj:`List[int]`):
List of IDs to which the special tokens will be added List of IDs to which the special tokens will be added.
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (:obj:`List[int]`, `optional`):
Optional second list of IDs for sequence pairs. Optional second list of IDs for sequence pairs.
Returns: Returns:
:obj:`List[int]`: list of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens. :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
""" """
if token_ids_1 is None: if token_ids_1 is None:
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id] return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
...@@ -218,16 +218,16 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -218,16 +218,16 @@ class XxxTokenizer(PreTrainedTokenizer):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
) -> List[int]: ) -> List[int]:
""" """
Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer ``prepare_for_model`` method. special tokens using the tokenizer ``prepare_for_model`` method.
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (:obj:`List[int]`):
List of ids. List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (:obj:`List[int]`, `optional`):
Optional second list of IDs for sequence pairs. Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`): already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
Set to True if the token list is already formatted with special tokens for the model Whether or not the token list is already formatted with special tokens for the model.
Returns: Returns:
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token. :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
...@@ -249,7 +249,7 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -249,7 +249,7 @@ class XxxTokenizer(PreTrainedTokenizer):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]: ) -> List[int]:
""" """
Creates a mask from the two sequences passed to be used in a sequence-pair classification task. Create a mask from the two sequences passed to be used in a sequence-pair classification task.
A BERT sequence pair mask has the following format: A BERT sequence pair mask has the following format:
:: ::
...@@ -257,11 +257,11 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -257,11 +257,11 @@ class XxxTokenizer(PreTrainedTokenizer):
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
| first sequence | second sequence | | first sequence | second sequence |
if token_ids_1 is None, only returns the first portion of the mask (0's). If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (:obj:`List[int]`):
List of ids. List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (:obj:`List[int]`, `optional`):
Optional second list of IDs for sequence pairs. Optional second list of IDs for sequence pairs.
...@@ -277,7 +277,7 @@ class XxxTokenizer(PreTrainedTokenizer): ...@@ -277,7 +277,7 @@ class XxxTokenizer(PreTrainedTokenizer):
def save_vocabulary(self, vocab_path): def save_vocabulary(self, vocab_path):
""" """
Save the sentencepiece vocabulary (copy original file) and special tokens file to a directory. Save the vocabulary (copy original file) and special tokens file to a directory.
Args: Args:
vocab_path (:obj:`str`): vocab_path (:obj:`str`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment