"...runtime/git@developer.sourcefind.cn:change/sglang.git" did not exist on "3ee40ff919db9c77f49173a38237a6879cc2390e"
Unverified Commit 011cc0be authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Fix all sphynx warnings (#5068)

parent af497b56
...@@ -17,7 +17,6 @@ The ``.optimization`` module provides: ...@@ -17,7 +17,6 @@ The ``.optimization`` module provides:
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay .. autoclass:: transformers.AdamWeightDecay
:members:
.. autofunction:: transformers.create_optimizer .. autofunction:: transformers.create_optimizer
......
...@@ -7,7 +7,7 @@ Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction an ...@@ -7,7 +7,7 @@ Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction an
There are two categories of pipeline abstractions to be aware about: There are two categories of pipeline abstractions to be aware about:
- The :class:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines - The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline` - The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline`
or :class:`~transformers.QuestionAnsweringPipeline` or :class:`~transformers.QuestionAnsweringPipeline`
...@@ -17,8 +17,7 @@ The pipeline abstraction ...@@ -17,8 +17,7 @@ The pipeline abstraction
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
other pipeline but requires an additional argument which is the `task`. other pipeline but requires an additional argument which is the `task`.
.. autoclass:: transformers.pipeline ... autofunction:: transformers.pipeline
:members:
The task specific pipelines The task specific pipelines
......
...@@ -30,35 +30,35 @@ Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will di ...@@ -30,35 +30,35 @@ Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will di
``AutoModelForPreTraining`` ``AutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForPreTraining .. autoclass:: transformers.AutoModelForPreTraining
:members: :members:
``AutoModelWithLMHead`` ``AutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelWithLMHead .. autoclass:: transformers.AutoModelWithLMHead
:members: :members:
``AutoModelForSequenceClassification`` ``AutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSequenceClassification .. autoclass:: transformers.AutoModelForSequenceClassification
:members: :members:
``AutoModelForQuestionAnswering`` ``AutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForQuestionAnswering .. autoclass:: transformers.AutoModelForQuestionAnswering
:members: :members:
``AutoModelForTokenClassification`` ``AutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification .. autoclass:: transformers.AutoModelForTokenClassification
:members: :members:
......
Encoder Decoder Models Encoder Decoder Models
----------- ------------------------
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model. This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
...@@ -10,7 +10,7 @@ An application of this architecture could be *summarization* using two pretraine ...@@ -10,7 +10,7 @@ An application of this architecture could be *summarization* using two pretraine
``EncoderDecoderConfig`` ``EncoderDecoderConfig``
~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderConfig .. autoclass:: transformers.EncoderDecoderConfig
:members: :members:
......
...@@ -4,7 +4,7 @@ Reformer ...@@ -4,7 +4,7 @@ Reformer
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_ file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview Overview
~~~~~ ~~~~~~~~~~
The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Here the abstract: Here the abstract:
...@@ -13,7 +13,7 @@ Here the abstract: ...@@ -13,7 +13,7 @@ Here the abstract:
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ . The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
Axial Positional Encodings Axial Positional Encodings
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix: Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
.. math:: .. math::
......
This diff is collapsed.
...@@ -692,7 +692,8 @@ following array should be the output: ...@@ -692,7 +692,8 @@ following array should be the output:
:: ::
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')] [('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
Summarization Summarization
---------------------------------------------------- ----------------------------------------------------
...@@ -769,7 +770,8 @@ Here Google`s T5 model is used that was only pre-trained on a multi-task mixed d ...@@ -769,7 +770,8 @@ Here Google`s T5 model is used that was only pre-trained on a multi-task mixed d
# T5 uses a max_length of 512 so we cut the article to 512 tokens. # T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512) inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True) outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
print(outputs) print(outputs)
Translation Translation
---------------------------------------------------- ----------------------------------------------------
......
...@@ -134,6 +134,7 @@ class AutoConfig: ...@@ -134,6 +134,7 @@ class AutoConfig:
The configuration class to instantiate is selected The configuration class to instantiate is selected
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `t5`: :class:`~transformers.T5Config` (T5 model) - `t5`: :class:`~transformers.T5Config` (T5 model)
- `distilbert`: :class:`~transformers.DistilBertConfig` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertConfig` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertConfig` (ALBERT model) - `albert`: :class:`~transformers.AlbertConfig` (ALBERT model)
......
...@@ -53,7 +53,7 @@ class T5Config(PretrainedConfig): ...@@ -53,7 +53,7 @@ class T5Config(PretrainedConfig):
probabilities. probabilities.
n_positions: The maximum sequence length that this model might n_positions: The maximum sequence length that this model might
ever be used with. Typically set this to something large just in case ever be used with. Typically set this to something large just in case
(e.g., 512 or 1024 or 2048). `n_positions` can also be accessed via the property `max_position_embeddings'. (e.g., 512 or 1024 or 2048). `n_positions` can also be accessed via the property `max_position_embeddings`.
type_vocab_size: The vocabulary size of the `token_type_ids` passed into type_vocab_size: The vocabulary size of the `token_type_ids` passed into
`T5Model`. `T5Model`.
initializer_factor: A factor for initializing all weight matrices (should be kept to 1.0, used for initialization testing). initializer_factor: A factor for initializing all weight matrices (should be kept to 1.0, used for initialization testing).
......
...@@ -84,11 +84,12 @@ class XLNetConfig(PretrainedConfig): ...@@ -84,11 +84,12 @@ class XLNetConfig(PretrainedConfig):
Argument used when doing sequence summary. Used in for the multiple choice head in Argument used when doing sequence summary. Used in for the multiple choice head in
:class:transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`. :class:transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`.
Is one of the following options: Is one of the following options:
- 'last' => take the last token hidden state (like XLNet)
- 'first' => take the first token hidden state (like Bert) - 'last' => take the last token hidden state (like XLNet)
- 'mean' => take the mean of all tokens hidden states - 'first' => take the first token hidden state (like Bert)
- 'cls_index' => supply a Tensor of classification token position (GPT/GPT-2) - 'mean' => take the mean of all tokens hidden states
- 'attn' => Not implemented now, use multi-head attention - 'cls_index' => supply a Tensor of classification token position (GPT/GPT-2)
- 'attn' => Not implemented now, use multi-head attention
summary_use_proj (:obj:`boolean`, optional, defaults to :obj:`True`): summary_use_proj (:obj:`boolean`, optional, defaults to :obj:`True`):
Argument used when doing sequence summary. Used in for the multiple choice head in Argument used when doing sequence summary. Used in for the multiple choice head in
:class:`~transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`. :class:`~transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`.
......
...@@ -83,7 +83,8 @@ class DataProcessor: ...@@ -83,7 +83,8 @@ class DataProcessor:
"""Base class for data converters for sequence classification data sets.""" """Base class for data converters for sequence classification data sets."""
def get_example_from_tensor_dict(self, tensor_dict): def get_example_from_tensor_dict(self, tensor_dict):
"""Gets an example from a dict with tensorflow tensors """Gets an example from a dict with tensorflow tensors.
Args: Args:
tensor_dict: Keys and values should match the corresponding Glue tensor_dict: Keys and values should match the corresponding Glue
tensorflow_dataset examples. tensorflow_dataset examples.
...@@ -91,15 +92,15 @@ class DataProcessor: ...@@ -91,15 +92,15 @@ class DataProcessor:
raise NotImplementedError() raise NotImplementedError()
def get_train_examples(self, data_dir): def get_train_examples(self, data_dir):
"""Gets a collection of `InputExample`s for the train set.""" """Gets a collection of :class:`InputExample` for the train set."""
raise NotImplementedError() raise NotImplementedError()
def get_dev_examples(self, data_dir): def get_dev_examples(self, data_dir):
"""Gets a collection of `InputExample`s for the dev set.""" """Gets a collection of :class:`InputExample` for the dev set."""
raise NotImplementedError() raise NotImplementedError()
def get_test_examples(self, data_dir): def get_test_examples(self, data_dir):
"""Gets a collection of `InputExample`s for the test set.""" """Gets a collection of :class:`InputExample` for the test set."""
raise NotImplementedError() raise NotImplementedError()
def get_labels(self): def get_labels(self):
......
...@@ -393,6 +393,7 @@ class AutoModel: ...@@ -393,6 +393,7 @@ class AutoModel:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `t5`: :class:`~transformers.T5Model` (T5 model) - `t5`: :class:`~transformers.T5Model` (T5 model)
- `distilbert`: :class:`~transformers.DistilBertModel` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertModel` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertModel` (ALBERT model) - `albert`: :class:`~transformers.AlbertModel` (ALBERT model)
...@@ -546,6 +547,7 @@ class AutoModelForPreTraining: ...@@ -546,6 +547,7 @@ class AutoModelForPreTraining:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `t5`: :class:`~transformers.T5ModelWithLMHead` (T5 model) - `t5`: :class:`~transformers.T5ModelWithLMHead` (T5 model)
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model) - `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
...@@ -698,6 +700,7 @@ class AutoModelWithLMHead: ...@@ -698,6 +700,7 @@ class AutoModelWithLMHead:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model) - `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model) - `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
...@@ -845,6 +848,7 @@ class AutoModelForCausalLM: ...@@ -845,6 +848,7 @@ class AutoModelForCausalLM:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `bert`: :class:`~transformers.BertLMHeadModel` (Bert model) - `bert`: :class:`~transformers.BertLMHeadModel` (Bert model)
- `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model) - `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
- `gpt2`: :class:`~transformers.GPT2LMHeadModel` (OpenAI GPT-2 model) - `gpt2`: :class:`~transformers.GPT2LMHeadModel` (OpenAI GPT-2 model)
...@@ -982,6 +986,7 @@ class AutoModelForMaskedLM: ...@@ -982,6 +986,7 @@ class AutoModelForMaskedLM:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model) - `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
- `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model) - `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model)
...@@ -1118,6 +1123,7 @@ class AutoModelForSeq2SeqLM: ...@@ -1118,6 +1123,7 @@ class AutoModelForSeq2SeqLM:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model) - `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
- `bart`: :class:`~transformers.BartForConditionalGeneration` (Bert model) - `bart`: :class:`~transformers.BartForConditionalGeneration` (Bert model)
- `marian`: :class:`~transformers.MarianMTModel` (Marian model) - `marian`: :class:`~transformers.MarianMTModel` (Marian model)
...@@ -1256,6 +1262,7 @@ class AutoModelForSequenceClassification: ...@@ -1256,6 +1262,7 @@ class AutoModelForSequenceClassification:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `distilbert`: :class:`~transformers.DistilBertForSequenceClassification` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForSequenceClassification` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertForSequenceClassification` (ALBERT model) - `albert`: :class:`~transformers.AlbertForSequenceClassification` (ALBERT model)
- `camembert`: :class:`~transformers.CamembertForSequenceClassification` (CamemBERT model) - `camembert`: :class:`~transformers.CamembertForSequenceClassification` (CamemBERT model)
...@@ -1402,6 +1409,7 @@ class AutoModelForQuestionAnswering: ...@@ -1402,6 +1409,7 @@ class AutoModelForQuestionAnswering:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `distilbert`: :class:`~transformers.DistilBertForQuestionAnswering` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForQuestionAnswering` (DistilBERT model)
- `albert`: :class:`~transformers.AlbertForQuestionAnswering` (ALBERT model) - `albert`: :class:`~transformers.AlbertForQuestionAnswering` (ALBERT model)
- `bert`: :class:`~transformers.BertForQuestionAnswering` (Bert model) - `bert`: :class:`~transformers.BertForQuestionAnswering` (Bert model)
...@@ -1547,6 +1555,7 @@ class AutoModelForTokenClassification: ...@@ -1547,6 +1555,7 @@ class AutoModelForTokenClassification:
The `from_pretrained()` method takes care of returning the correct model class instance The `from_pretrained()` method takes care of returning the correct model class instance
based on the `model_type` property of the config object, or when it's missing, based on the `model_type` property of the config object, or when it's missing,
falling back to using pattern matching on the `pretrained_model_name_or_path` string: falling back to using pattern matching on the `pretrained_model_name_or_path` string:
- `distilbert`: :class:`~transformers.DistilBertForTokenClassification` (DistilBERT model) - `distilbert`: :class:`~transformers.DistilBertForTokenClassification` (DistilBERT model)
- `xlm`: :class:`~transformers.XLMForTokenClassification` (XLM model) - `xlm`: :class:`~transformers.XLMForTokenClassification` (XLM model)
- `xlm-roberta`: :class:`~transformers.XLMRobertaForTokenClassification` (XLM-RoBERTa?Para model) - `xlm-roberta`: :class:`~transformers.XLMRobertaForTokenClassification` (XLM-RoBERTa?Para model)
......
...@@ -745,9 +745,10 @@ class ElectraForTokenClassification(ElectraPreTrainedModel): ...@@ -745,9 +745,10 @@ class ElectraForTokenClassification(ElectraPreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
"""ELECTRA Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of """
the hidden-states output to compute `span start logits` and `span end logits`). """, ELECTRA Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
ELECTRA_INPUTS_DOCSTRING, layers on top of the hidden-states output to compute `span start logits` and `span end logits`).""",
ELECTRA_START_DOCSTRING,
) )
class ElectraForQuestionAnswering(ElectraPreTrainedModel): class ElectraForQuestionAnswering(ElectraPreTrainedModel):
config_class = ElectraConfig config_class = ElectraConfig
......
...@@ -435,7 +435,7 @@ class LongformerSelfAttention(nn.Module): ...@@ -435,7 +435,7 @@ class LongformerSelfAttention(nn.Module):
LONGFORMER_START_DOCSTRING = r""" LONGFORMER_START_DOCSTRING = r"""
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class. This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ sub-class.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
usage and behavior. usage and behavior.
...@@ -467,7 +467,7 @@ LONGFORMER_INPUTS_DOCSTRING = r""" ...@@ -467,7 +467,7 @@ LONGFORMER_INPUTS_DOCSTRING = r"""
Tokens with global attention attends to all other tokens, and all other tokens attend to them. This is important for Tokens with global attention attends to all other tokens, and all other tokens attend to them. This is important for
task-specific finetuning because it makes the model more flexible at representing the task. For example, task-specific finetuning because it makes the model more flexible at representing the task. For example,
for classification, the <s> token should be given global attention. For QA, all question tokens should also have for classification, the <s> token should be given global attention. For QA, all question tokens should also have
global attention. Please refer to the Longformer paper https://arxiv.org/abs/2004.05150 for more details. global attention. Please refer to the `Longformer paper <https://arxiv.org/abs/2004.05150>`__ for more details.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
``0`` for local attention (a sliding window attention), ``0`` for local attention (a sliding window attention),
``1`` for global attention (tokens that attend to all other tokens, and all other tokens attend to them). ``1`` for global attention (tokens that attend to all other tokens, and all other tokens attend to them).
...@@ -500,7 +500,7 @@ class LongformerModel(RobertaModel): ...@@ -500,7 +500,7 @@ class LongformerModel(RobertaModel):
""" """
This class overrides :class:`~transformers.RobertaModel` to provide the ability to process This class overrides :class:`~transformers.RobertaModel` to provide the ability to process
long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer
<https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention
combines a local (sliding window) and global attention to extend to long documents without the O(n^2) increase in combines a local (sliding window) and global attention to extend to long documents without the O(n^2) increase in
memory and compute. memory and compute.
......
...@@ -1451,14 +1451,10 @@ class ReformerPreTrainedModel(PreTrainedModel): ...@@ -1451,14 +1451,10 @@ class ReformerPreTrainedModel(PreTrainedModel):
REFORMER_START_DOCSTRING = r""" REFORMER_START_DOCSTRING = r"""
Reformer was proposed in Reformer was proposed in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.0445>`__
`Reformer: The Efficient Transformer`_
by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
.. _`Reformer: The Efficient Transformer`: This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ sub-class.
https://arxiv.org/abs/2001.04451
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
usage and behavior. usage and behavior.
......
...@@ -775,19 +775,14 @@ class T5Stack(T5PreTrainedModel): ...@@ -775,19 +775,14 @@ class T5Stack(T5PreTrainedModel):
return outputs # last-layer hidden state, (presents,) (all hidden states), (all attentions) return outputs # last-layer hidden state, (presents,) (all hidden states), (all attentions)
T5_START_DOCSTRING = r""" The T5 model was proposed in T5_START_DOCSTRING = r"""
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`_ The T5 model was proposed in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting. It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
This model is a PyTorch `torch.nn.Module`_ sub-class. Use it as a regular PyTorch Module and This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#module>`__ sub-class. Use it as a
refer to the PyTorch documentation for all matter related to general usage and behavior. regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
.. _`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`:
https://arxiv.org/abs/1910.10683
.. _`torch.nn.Module`:
https://pytorch.org/docs/stable/nn.html#module
Parameters: Parameters:
config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model. config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model.
...@@ -804,7 +799,7 @@ T5_INPUTS_DOCSTRING = r""" ...@@ -804,7 +799,7 @@ T5_INPUTS_DOCSTRING = r"""
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details. :func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
To know more on how to prepare :obj:`input_ids` for pre-training take a look at To know more on how to prepare :obj:`input_ids` for pre-training take a look at
`T5 Training <./t5.html#training>`_ . `T5 Training <./t5.html#training>`__.
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`): attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
Mask to avoid performing attention on padding token indices. Mask to avoid performing attention on padding token indices.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -817,7 +812,7 @@ T5_INPUTS_DOCSTRING = r""" ...@@ -817,7 +812,7 @@ T5_INPUTS_DOCSTRING = r"""
Provide for sequence to sequence training. T5 uses the pad_token_id as the starting token for decoder_input_ids generation. Provide for sequence to sequence training. T5 uses the pad_token_id as the starting token for decoder_input_ids generation.
If `decoder_past_key_value_states` is used, optionally only the last `decoder_input_ids` have to be input (see `decoder_past_key_value_states`). If `decoder_past_key_value_states` is used, optionally only the last `decoder_input_ids` have to be input (see `decoder_past_key_value_states`).
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
`T5 Training <./t5.html#training>`_ . `T5 Training <./t5.html#training>`__.
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, tgt_seq_len)`, `optional`, defaults to :obj:`None`): decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, tgt_seq_len)`, `optional`, defaults to :obj:`None`):
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default. Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
decoder_past_key_value_states (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`): decoder_past_key_value_states (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
...@@ -902,8 +897,8 @@ class T5Model(T5PreTrainedModel): ...@@ -902,8 +897,8 @@ class T5Model(T5PreTrainedModel):
output_attentions=None, output_attentions=None,
): ):
r""" r"""
Return: Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs. :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model. Sequence of hidden-states at the output of the last layer of the model.
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output. If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
...@@ -925,13 +920,13 @@ class T5Model(T5PreTrainedModel): ...@@ -925,13 +920,13 @@ class T5Model(T5PreTrainedModel):
Examples:: Examples::
from transformers import T5Tokenizer, T5Model from transformers import T5Tokenizer, T5Model
tokenizer = T5Tokenizer.from_pretrained('t5-small') tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5Model.from_pretrained('t5-small') model = T5Model.from_pretrained('t5-small')
input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt") # Batch size 1 input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt") # Batch size 1
outputs = model(input_ids=input_ids, decoder_input_ids=input_ids) outputs = model(input_ids=input_ids, decoder_input_ids=input_ids)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
""" """
...@@ -1030,15 +1025,15 @@ class T5ForConditionalGeneration(T5PreTrainedModel): ...@@ -1030,15 +1025,15 @@ class T5ForConditionalGeneration(T5PreTrainedModel):
): ):
r""" r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`): labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`, defaults to :obj:`None`):
Labels for computing the sequence classification/regression loss. Labels for computing the sequence classification/regression loss.
Indices should be in :obj:`[-100, 0, ..., config.vocab_size - 1]`. Indices should be in :obj:`[-100, 0, ..., config.vocab_size - 1]`.
All labels set to ``-100`` are ignored (masked), the loss is only All labels set to ``-100`` are ignored (masked), the loss is only
computed for labels in ``[0, ..., config.vocab_size]`` computed for labels in ``[0, ..., config.vocab_size]``
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`): kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
Used to hide legacy arguments that have been deprecated. Used to hide legacy arguments that have been deprecated.
Returns: Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs. :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided):
Classification loss (cross entropy). Classification loss (cross entropy).
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`) prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
......
...@@ -705,38 +705,38 @@ class TFAlbertModel(TFAlbertPreTrainedModel): ...@@ -705,38 +705,38 @@ class TFAlbertModel(TFAlbertPreTrainedModel):
@add_start_docstrings_to_callable(ALBERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)")) @add_start_docstrings_to_callable(ALBERT_INPUTS_DOCSTRING.format("(batch_size, sequence_length)"))
def call(self, inputs, **kwargs): def call(self, inputs, **kwargs):
r""" r"""
Returns: Returns:
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs: :obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.AlbertConfig`) and inputs:
last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model. Sequence of hidden-states at the output of the last layer of the model.
pooler_output (:obj:`tf.Tensor` of shape :obj:`(batch_size, hidden_size)`): pooler_output (:obj:`tf.Tensor` of shape :obj:`(batch_size, hidden_size)`):
Last layer hidden-state of the first token of the sequence (classification token) Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification) layer weights are trained from the next sentence prediction (classification)
objective during Albert pretraining. This output is usually *not* a good summary objective during Albert pretraining. This output is usually *not* a good summary
of the semantic content of the input, you're often better with averaging or pooling of the semantic content of the input, you're often better with averaging or pooling
the sequence of hidden-states for the whole input sequence. the sequence of hidden-states for the whole input sequence.
hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when :obj:`config.output_hidden_states=True`): hidden_states (:obj:`tuple(tf.Tensor)`, `optional`, returned when :obj:`config.output_hidden_states=True`):
tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer) tuple of :obj:`tf.Tensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`. of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs. Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``output_attentions=True`` is passed or ``config.output_attentions=True``): attentions (:obj:`tuple(tf.Tensor)`, `optional`, returned when ``output_attentions=True`` is passed or ``config.output_attentions=True``):
tuple of :obj:`tf.Tensor` (one for each layer) of shape tuple of :obj:`tf.Tensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`: :obj:`(batch_size, num_heads, sequence_length, sequence_length)`:
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Examples:: Examples::
import tensorflow as tf import tensorflow as tf
from transformers import AlbertTokenizer, TFAlbertModel from transformers import AlbertTokenizer, TFAlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2') tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = TFAlbertModel.from_pretrained('albert-base-v2') model = TFAlbertModel.from_pretrained('albert-base-v2')
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1
outputs = model(input_ids) outputs = model(input_ids)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
""" """
outputs = self.albert(inputs, **kwargs) outputs = self.albert(inputs, **kwargs)
......
...@@ -408,12 +408,11 @@ class TFElectraModel(TFElectraPreTrainedModel): ...@@ -408,12 +408,11 @@ class TFElectraModel(TFElectraPreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """Electra model with a binary classification head on top as used during pre-training for identifying generated
Electra model with a binary classification head on top as used during pre-training for identifying generated tokens.
tokens.
Even though both the discriminator and generator may be loaded into this model, the discriminator is Even though both the discriminator and generator may be loaded into this model, the discriminator is
the only model of the two to have the correct classification head to be used for this model.""", the only model of the two to have the correct classification head to be used for this model.""",
ELECTRA_START_DOCSTRING, ELECTRA_START_DOCSTRING,
) )
class TFElectraForPreTraining(TFElectraPreTrainedModel): class TFElectraForPreTraining(TFElectraPreTrainedModel):
...@@ -501,11 +500,10 @@ class TFElectraMaskedLMHead(tf.keras.layers.Layer): ...@@ -501,11 +500,10 @@ class TFElectraMaskedLMHead(tf.keras.layers.Layer):
@add_start_docstrings( @add_start_docstrings(
""" """Electra model with a language modeling head on top.
Electra model with a language modeling head on top.
Even though both the discriminator and generator may be loaded into this model, the generator is Even though both the discriminator and generator may be loaded into this model, the generator is
the only model of the two to have been trained for the masked language modeling task.""", the only model of the two to have been trained for the masked language modeling task.""",
ELECTRA_START_DOCSTRING, ELECTRA_START_DOCSTRING,
) )
class TFElectraForMaskedLM(TFElectraPreTrainedModel): class TFElectraForMaskedLM(TFElectraPreTrainedModel):
...@@ -588,10 +586,9 @@ class TFElectraForMaskedLM(TFElectraPreTrainedModel): ...@@ -588,10 +586,9 @@ class TFElectraForMaskedLM(TFElectraPreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """Electra model with a token classification head on top.
Electra model with a token classification head on top.
Both the discriminator and generator may be loaded into this model.""", Both the discriminator and generator may be loaded into this model.""",
ELECTRA_START_DOCSTRING, ELECTRA_START_DOCSTRING,
) )
class TFElectraForTokenClassification(TFElectraPreTrainedModel, TFTokenClassificationLoss): class TFElectraForTokenClassification(TFElectraPreTrainedModel, TFTokenClassificationLoss):
......
...@@ -772,19 +772,15 @@ class TFT5PreTrainedModel(TFPreTrainedModel): ...@@ -772,19 +772,15 @@ class TFT5PreTrainedModel(TFPreTrainedModel):
return dummy_inputs return dummy_inputs
T5_START_DOCSTRING = r""" The T5 model was proposed in T5_START_DOCSTRING = r"""
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`_ The T5 model was proposed in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting. It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
This model is a tf.keras.Model `tf.keras.Model`_ sub-class. Use it as a regular TF 2.0 Keras Model and This model is a `tf.keras.Model <https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model>`__
refer to the TF 2.0 documentation for all matter related to general usage and behavior. sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to
general usage and behavior.
.. _`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`:
https://arxiv.org/abs/1910.10683
.. _`tf.keras.Model`:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model
Note on the model inputs: Note on the model inputs:
TF 2.0 models accepts two formats as inputs: TF 2.0 models accepts two formats as inputs:
...@@ -796,7 +792,7 @@ T5_START_DOCSTRING = r""" The T5 model was proposed in ...@@ -796,7 +792,7 @@ T5_START_DOCSTRING = r""" The T5 model was proposed in
If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument : If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
- a single Tensor with inputs only and nothing else: `model(inputs_ids) - a single Tensor with inputs only and nothing else: `model(inputs_ids)`
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring: - a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
`model([inputs, attention_mask])` or `model([inputs, attention_mask, token_type_ids])` `model([inputs, attention_mask])` or `model([inputs, attention_mask, token_type_ids])`
- a dictionary with one or several input Tensors associaed to the input names given in the docstring: - a dictionary with one or several input Tensors associaed to the input names given in the docstring:
...@@ -818,7 +814,7 @@ T5_INPUTS_DOCSTRING = r""" ...@@ -818,7 +814,7 @@ T5_INPUTS_DOCSTRING = r"""
the right or the left. the right or the left.
Indices can be obtained using :class:`transformers.T5Tokenizer`. Indices can be obtained using :class:`transformers.T5Tokenizer`.
To know more on how to prepare :obj:`inputs` for pre-training take a look at To know more on how to prepare :obj:`inputs` for pre-training take a look at
`T5 Training <./t5.html#training>`_ . `T5 Training <./t5.html#training>`__.
See :func:`transformers.PreTrainedTokenizer.encode` and See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details. :func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
decoder_input_ids (:obj:`tf.Tensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`): decoder_input_ids (:obj:`tf.Tensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
...@@ -850,7 +846,7 @@ T5_INPUTS_DOCSTRING = r""" ...@@ -850,7 +846,7 @@ T5_INPUTS_DOCSTRING = r"""
This is useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors This is useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
than the model's internal embedding lookup matrix. than the model's internal embedding lookup matrix.
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
`T5 Training <./t5.html#training>`_ . `T5 Training <./t5.html#training>`__.
head_mask: (:obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`, defaults to :obj:`None`): head_mask: (:obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`, defaults to :obj:`None`):
Mask to nullify selected heads of the self-attention modules. Mask to nullify selected heads of the self-attention modules.
Mask values selected in ``[0, 1]``: Mask values selected in ``[0, 1]``:
...@@ -897,8 +893,8 @@ class TFT5Model(TFT5PreTrainedModel): ...@@ -897,8 +893,8 @@ class TFT5Model(TFT5PreTrainedModel):
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING) @add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
def call(self, inputs, **kwargs): def call(self, inputs, **kwargs):
r""" r"""
Return: Returns:
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs. :obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`): last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model. Sequence of hidden-states at the output of the last layer of the model.
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output. If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
...@@ -1024,8 +1020,8 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel): ...@@ -1024,8 +1020,8 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING) @add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
def call(self, inputs, **kwargs): def call(self, inputs, **kwargs):
r""" r"""
Return: Returns:
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs. :obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`lm_label` is provided): loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`lm_label` is provided):
Classification loss (cross entropy). Classification loss (cross entropy).
prediction_scores (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`) prediction_scores (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
......
...@@ -294,7 +294,6 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin): ...@@ -294,7 +294,6 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
Parameters: Parameters:
pretrained_model_name_or_path: either: pretrained_model_name_or_path: either:
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``. - a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``. - a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``. - a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
...@@ -306,11 +305,11 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin): ...@@ -306,11 +305,11 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
config: (`optional`) one of: config: (`optional`) one of:
- an instance of a class derived from :class:`~transformers.PretrainedConfig`, or - an instance of a class derived from :class:`~transformers.PretrainedConfig`, or
- a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()` - a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()`
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory. - the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory. - the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
from_pt: (`optional`) boolean, default False: from_pt: (`optional`) boolean, default False:
Load the model weights from a PyTorch state_dict save file (see docstring of pretrained_model_name_or_path argument). Load the model weights from a PyTorch state_dict save file (see docstring of pretrained_model_name_or_path argument).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment