List of attention layer types in ascending order. It can be chosen between a
List of attention layer types in ascending order. It can be chosen between a
LSHSelfAttention layer ("lsh") and a LocalSelfAttention layer ("local").
LSHSelfAttention layer ("lsh") and a LocalSelfAttention layer ("local").
For more information on LSHSelfAttention layer, see `LSH Self Attention <reformer.html#lsh-self-attention>`__ .
For more information on LSHSelfAttention layer, see `LSH Self Attention <reformer.html#lsh-self-attention>`__ .
For more information on LocalSelfAttention layer, see `Local Self Attention <reformer.html#local-sensitive-hashing-self-attention>`__ .
For more information on LocalSelfAttention layer, see `Local Self Attention <reformer.html#local-sensitive-hashing-self-attention>`__ .
axial_pos_embds (:obj:`bool`, optional, defaults to True):
axial_pos_embds (:obj:`bool`, optional, defaults to True):
If `True` use axial position embeddings. For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__
If `True` use axial position embeddings. For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__
axial_norm_std (:obj:`float`, optional, defaluts to 1.0):
axial_norm_std (:obj:`float`, optional, defaluts to 1.0):
The standard deviation of the normal_initializer for initializing the weight matrices of the axial positional encodings.
The standard deviation of the normal_initializer for initializing the weight matrices of the axial positional encodings.
axial_pos_shape (:obj:`list(int)`, optional, defaults to `[64, 64]`):
axial_pos_shape (:obj:`list(int)`, optional, defaults to `[64, 64]`):
The position dims of the axial position encodings.
The position dims of the axial position encodings.
During training the product of the position dims has to equal the sequence length.
During training the product of the position dims has to equal the sequence length.
For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__.
For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__.
axial_pos_embds_dim (:obj:`list(int)`, optional, defaults to `[64, 192]`):
axial_pos_embds_dim (:obj:`list(int)`, optional, defaults to `[64, 192]`):
The embedding dims of the axial position encodings.
The embedding dims of the axial position encodings.
The sum of the embedding dims has to equal the hidden size.
The sum of the embedding dims has to equal the hidden size.
For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__.
For more information on how axial position embeddings work, see `Axial Position Encodings <reformer.html#axial-positional-encodings>`__.
chunk_size_lm_head (:obj:`int`, optional, defaults to 0):
chunk_size_lm_head (:obj:`int`, optional, defaults to 0):
The chunk size of the final language model feed forward head layer.
The chunk size of the final language model feed forward head layer.
A chunk size of 0 means that the feed forward layer is not chunked.
A chunk size of 0 means that the feed forward layer is not chunked.
A chunk size of n means that the feed forward layer processes n < sequence_length embeddings at a time.
A chunk size of n means that the feed forward layer processes n < sequence_length embeddings at a time.
For more information on feed forward chunking, see `How does Feed Forward Chunking work? <../glossary.html#feed-forward-chunking>`__ .
For more information on feed forward chunking, see `How does Feed Forward Chunking work? <../glossary.html#feed-forward-chunking>`__ .
eos_token_id (:obj:`int`, optional, defaults to 2):
eos_token_id (:obj:`int`, optional, defaults to 2):
The token id for the <EOS> token.
The token id for the <EOS> token.
feed_forward_size (:obj:`int`, optional, defaults to 512):
feed_forward_size (:obj:`int`, optional, defaults to 512):
Dimensionality of the "feed_forward" (i.e., feed-forward) layer in the residual attention block.
Dimensionality of the "feed_forward" (i.e., feed-forward) layer in the residual attention block.
hash_seed (:obj:`int`, optional, defaults to `None`):
hash_seed (:obj:`int`, optional, defaults to `None`):
Seed that can be used to make local sensitive hashing in LSHSelfAttention deterministic. This should only be set for testing purposed. For evaluation and training purposes `hash_seed` should be set to `None` to ensure fully random rotations in local sensitive hashing scheme.
Seed that can be used to make local sensitive hashing in LSHSelfAttention deterministic. This should only be set for testing purposed. For evaluation and training purposes `hash_seed` should be set to `None` to ensure fully random rotations in local sensitive hashing scheme.
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to "relu"):
hidden_act (:obj:`str` or :obj:`function`, optional, defaults to "relu"):
The non-linear activation function (function or string) in the feed forward layer in the residual attention block.
The non-linear activation function (function or string) in the feed forward layer in the residual attention block.
If string, "gelu", "relu", "swish", "gelu_new" and "gelu_fast" are supported.
If string, "gelu", "relu", "swish", "gelu_new" and "gelu_fast" are supported.
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.05):
hidden_dropout_prob (:obj:`float`, optional, defaults to 0.05):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
hidden_size (:obj:`int`, optional, defaults to 256):
hidden_size (:obj:`int`, optional, defaults to 256):
Dimensionality of the output hidden states of the residual attention blocks.
Dimensionality of the output hidden states of the residual attention blocks.
initializer_range (:obj:`float`, optional, defaults to 0.02):
initializer_range (:obj:`float`, optional, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
is_decoder (:obj:`bool`, optional, defaults to False):
is_decoder (:obj:`bool`, optional, defaults to False):
If `is_decoder` is True, a causal mask is used in addition to `attention_mask`.
If `is_decoder` is True, a causal mask is used in addition to `attention_mask`.
When using the Reformer for causal language modeling, `is_decoder` is set to `True`.
When using the Reformer for causal language modeling, `is_decoder` is set to `True`.
layer_norm_eps (:obj:`float`, optional, defaults to 1e-12):
layer_norm_eps (:obj:`float`, optional, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
local_chunk_length (:obj:`int`, optional, defaults to 64):
local_chunk_length (:obj:`int`, optional, defaults to 64):
Length of chunk which attends to itself in LocalSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
Length of chunk which attends to itself in LocalSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
local_num_chunks_before (:obj:`int`, optional, defaults to 1):
local_num_chunks_before (:obj:`int`, optional, defaults to 1):
Number of previous neighbouring chunks to attend to in LocalSelfAttention layer to itself.
Number of previous neighbouring chunks to attend to in LocalSelfAttention layer to itself.
local_num_chunks_after (:obj:`int`, optional, defaults to 0):
local_num_chunks_after (:obj:`int`, optional, defaults to 0):
Number of following neighbouring chunks to attend to in LocalSelfAttention layer in addition to itself.
Number of following neighbouring chunks to attend to in LocalSelfAttention layer in addition to itself.
local_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
local_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
The dropout ratio for the attention probabilities in LocalSelfAttention.
The dropout ratio for the attention probabilities in LocalSelfAttention.
lsh_attn_chunk_length (:obj:`int`, optional, defaults to 64):
lsh_attn_chunk_length (:obj:`int`, optional, defaults to 64):
Length of chunk which attends to itself in LSHSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
Length of chunk which attends to itself in LSHSelfAttention. Chunking reduces memory complexity from sequence length x sequence length (self attention) to chunk length x chunk length x sequence length / chunk length (chunked self attention).
lsh_num_chunks_before (:obj:`int`, optional, defaults to 1):
lsh_num_chunks_before (:obj:`int`, optional, defaults to 1):
Number of previous neighbouring chunks to attend to in LSHSelfAttention layer to itself.
Number of previous neighbouring chunks to attend to in LSHSelfAttention layer to itself.
lsh_num_chunks_after (:obj:`int`, optional, defaults to 0):
lsh_num_chunks_after (:obj:`int`, optional, defaults to 0):
Number of following neighbouring chunks to attend to in LSHSelfAttention layer to itself.
Number of following neighbouring chunks to attend to in LSHSelfAttention layer to itself.
lsh_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
lsh_attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0.1):
The dropout ratio for the attention probabilities in LSHSelfAttention.
The dropout ratio for the attention probabilities in LSHSelfAttention.
max_position_embeddings (:obj:`int`, optional, defaults to 4096):
max_position_embeddings (:obj:`int`, optional, defaults to 4096):
The maximum sequence length that this model might ever be used with.
The maximum sequence length that this model might ever be used with.
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
num_attention_heads (:obj:`int`, optional, defaults to 12):
num_attention_heads (:obj:`int`, optional, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
Number of attention heads for each attention layer in the Transformer encoder.
num_buckets (:obj:`int` or :obj:`list(int)`, optional, defaults to `None`):
num_buckets (:obj:`int` or :obj:`list(int)`, optional, defaults to `None`):
Number of buckets, the key query vectors can be "hashed into" using the locality sensitive hashing scheme. Each query key vector is hashed into a hash in `1, ..., num_buckets`.
Number of buckets, the key query vectors can be "hashed into" using the locality sensitive hashing scheme. Each query key vector is hashed into a hash in `1, ..., num_buckets`.
The number of buckets can also be factorized into a list for improved memory complexity. In this case, each query key vector is hashed into a hash in `1-1, 1-2, ..., num_buckets[0]-1, ..., num_buckets[0]-num_buckets[1]` if `num_buckets` is factorized into two factors.
The number of buckets can also be factorized into a list for improved memory complexity. In this case, each query key vector is hashed into a hash in `1-1, 1-2, ..., num_buckets[0]-1, ..., num_buckets[0]-num_buckets[1]` if `num_buckets` is factorized into two factors.
The number of buckets (or the product the factors) should approximately equal sequence length / lsh_chunk_length. If `num_buckets` is set to `None`, a good value for `num_buckets` is calculated on the fly.
The number of buckets (or the product the factors) should approximately equal sequence length / lsh_chunk_length. If `num_buckets` is set to `None`, a good value for `num_buckets` is calculated on the fly.
num_hashes (:obj:`int`, optional, defaults to 1):
num_hashes (:obj:`int`, optional, defaults to 1):
Number of hashing rounds (e.g. number of random rotations) in Local Sensitive Hashing scheme.
Number of hashing rounds (e.g. number of random rotations) in Local Sensitive Hashing scheme.
The higher `num_hashes`, the more accurate the `LSHSelfAttention` becomes, but also the more memory and time intensive the hashing becomes.
The higher `num_hashes`, the more accurate the `LSHSelfAttention` becomes, but also the more memory and time intensive the hashing becomes.
pad_token_id (:obj:`int`, optional, defaults to 0):
pad_token_id (:obj:`int`, optional, defaults to 0):
The token id for the <PAD> token.
The token id for the <PAD> token.
vocab_size (:obj:`int`, optional, defaults to 320):
vocab_size (:obj:`int`, optional, defaults to 320):
Vocabulary size of the Reformer model. Defines the different tokens that
Vocabulary size of the Reformer model. Defines the different tokens that
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.ReformerModel`.
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.ReformerModel`.
Example::
Example::
>>> from transformers import ReformerModel, ReformerConfig
>>> from transformers import ReformerModel, ReformerConfig
Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving
Handles a few parameters common to all models' configurations as well as methods for loading/downloading/saving
configurations.
configurations.
Note:
Note:
A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to
A configuration file can be loaded and saved to disk. Loading the configuration file and using this file to
initialize a model does **not** load the model weights.
initialize a model does **not** load the model weights.
It only affects the model's configuration.
It only affects the model's configuration.
Class attributes (overridden by derived classes)
Class attributes (overridden by derived classes)
- **model_type** (:obj:`str`): An identifier for the model type, serialized into the JSON file, and used to
- **model_type** (:obj:`str`): An identifier for the model type, serialized into the JSON file, and used to
recreate the correct object in :class:`~transformers.AutoConfig`.
recreate the correct object in :class:`~transformers.AutoConfig`.
Args:
Args:
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`False`):
output_hidden_states (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should return all hidden-states.
Whether or not the model should return all hidden-states.
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`False`):
output_attentions (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should returns all attentions.
Whether or not the model should returns all attentions.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models).
Whether or not the model should return the last key/values attentions (not used by all models).
return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
return_dict (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a
Whether or not the model should return a :class:`~transformers.file_utils.ModelOutput` instead of a
plain tuple.
plain tuple.
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
is_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether the model is used as an encoder/decoder or not.
Whether the model is used as an encoder/decoder or not.
is_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
is_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether the model is used as decoder or not (in which case it's used as an encoder).
Whether the model is used as decoder or not (in which case it's used as an encoder).
add_cross_attention (:obj:`bool`, `optional`, defaults to :obj:`False`):
add_cross_attention (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether cross-attention layers should be added to the model. Note, this option is only relevant for models that can be used as decoder models within the `:class:~transformers.EncoderDecoderModel` class, which consists of all models in ``AUTO_MODELS_FOR_CAUSAL_LM``.
Whether cross-attention layers should be added to the model. Note, this option is only relevant for models that can be used as decoder models within the `:class:~transformers.EncoderDecoderModel` class, which consists of all models in ``AUTO_MODELS_FOR_CAUSAL_LM``.
tie_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`)
tie_encoder_decoder (:obj:`bool`, `optional`, defaults to :obj:`False`)
Whether all encoder weights should be tied to their equivalent decoder weights. This requires the encoder and decoder model to have the exact same parameter names.
Whether all encoder weights should be tied to their equivalent decoder weights. This requires the encoder and decoder model to have the exact same parameter names.
prune_heads (:obj:`Dict[int, List[int]]`, `optional`, defaults to :obj:`{}`):
prune_heads (:obj:`Dict[int, List[int]]`, `optional`, defaults to :obj:`{}`):
Pruned heads of the model. The keys are the selected layer indices and the associated values, the list
Pruned heads of the model. The keys are the selected layer indices and the associated values, the list
of heads to prune in said layer.
of heads to prune in said layer.
For instance ``{1: [0, 2], 2: [2, 3]}`` will prune heads 0 and 2 on layer 1 and heads 2 and 3 on layer
For instance ``{1: [0, 2], 2: [2, 3]}`` will prune heads 0 and 2 on layer 1 and heads 2 and 3 on layer
2.
2.
xla_device (:obj:`bool`, `optional`):
xla_device (:obj:`bool`, `optional`):
A flag to indicate if TPU are available or not.
A flag to indicate if TPU are available or not.
chunk_size_feed_forward (:obj:`int`, `optional`, defaults to :obj:`0`):
chunk_size_feed_forward (:obj:`int`, `optional`, defaults to :obj:`0`):
The chunk size of all feed forward layers in the residual attention blocks.
The chunk size of all feed forward layers in the residual attention blocks.
A chunk size of :obj:`0` means that the feed forward layer is not chunked.
A chunk size of :obj:`0` means that the feed forward layer is not chunked.
A chunk size of n means that the feed forward layer processes :obj:`n` < sequence_length embeddings at a time.
A chunk size of n means that the feed forward layer processes :obj:`n` < sequence_length embeddings at a time.
For more information on feed forward chunking, see `How does Feed Forward Chunking work? <../glossary.html#feed-forward-chunking>`__ .
For more information on feed forward chunking, see `How does Feed Forward Chunking work? <../glossary.html#feed-forward-chunking>`__ .
Parameters for sequence generation
Parameters for sequence generation
- **max_length** (:obj:`int`, `optional`, defaults to 20) -- Maximum length that will be used by
- **max_length** (:obj:`int`, `optional`, defaults to 20) -- Maximum length that will be used by
default in the :obj:`generate` method of the model.
default in the :obj:`generate` method of the model.
- **min_length** (:obj:`int`, `optional`, defaults to 10) -- Minimum length that will be used by
- **min_length** (:obj:`int`, `optional`, defaults to 10) -- Minimum length that will be used by
default in the :obj:`generate` method of the model.
default in the :obj:`generate` method of the model.
- **do_sample** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Flag that will be used by default in
- **do_sample** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Flag that will be used by default in
the :obj:`generate` method of the model. Whether or not to use sampling ; use greedy decoding otherwise.
the :obj:`generate` method of the model. Whether or not to use sampling ; use greedy decoding otherwise.
- **early_stopping** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Flag that will be used by
- **early_stopping** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Flag that will be used by
default in the :obj:`generate` method of the model. Whether to stop the beam search when at least
default in the :obj:`generate` method of the model. Whether to stop the beam search when at least
``num_beams`` sentences are finished per batch or not.
``num_beams`` sentences are finished per batch or not.
- **num_beams** (:obj:`int`, `optional`, defaults to 1) -- Number of beams for beam search that will be
- **num_beams** (:obj:`int`, `optional`, defaults to 1) -- Number of beams for beam search that will be
used by default in the :obj:`generate` method of the model. 1 means no beam search.
used by default in the :obj:`generate` method of the model. 1 means no beam search.
- **temperature** (:obj:`float`, `optional`, defaults to 1) -- The value used to module the next token
- **temperature** (:obj:`float`, `optional`, defaults to 1) -- The value used to module the next token
probabilities that will be used by default in the :obj:`generate` method of the model. Must be strictly
probabilities that will be used by default in the :obj:`generate` method of the model. Must be strictly
positive.
positive.
- **top_k** (:obj:`int`, `optional`, defaults to 50) -- Number of highest probability vocabulary tokens to
- **top_k** (:obj:`int`, `optional`, defaults to 50) -- Number of highest probability vocabulary tokens to
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
keep for top-k-filtering that will be used by default in the :obj:`generate` method of the model.
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
- **top_p** (:obj:`float`, `optional`, defaults to 1) -- Value that will be used by default in the
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
:obj:`generate` method of the model for ``top_p``. If set to float < 1, only the most probable tokens
with probabilities that add up to ``top_p`` or higher are kept for generation.
with probabilities that add up to ``top_p`` or higher are kept for generation.
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
- **repetition_penalty** (:obj:`float`, `optional`, defaults to 1) -- Parameter for repetition penalty
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
that will be used by default in the :obj:`generate` method of the model. 1.0 means no penalty.
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
- **length_penalty** (:obj:`float`, `optional`, defaults to 1) -- Exponential penalty to the length that
will be used by default in the :obj:`generate` method of the model.
will be used by default in the :obj:`generate` method of the model.
- **no_repeat_ngram_size** (:obj:`int`, `optional`, defaults to 0) -- Value that will be used by default
- **no_repeat_ngram_size** (:obj:`int`, `optional`, defaults to 0) -- Value that will be used by default
in the :obj:`generate` method of the model for ``no_repeat_ngram_size``. If set to int > 0, all ngrams of
in the :obj:`generate` method of the model for ``no_repeat_ngram_size``. If set to int > 0, all ngrams of
that size can only occur once.
that size can only occur once.
- **bad_words_ids** (:obj:`List[int]`, `optional`) -- List of token ids that are not allowed to be
- **bad_words_ids** (:obj:`List[int]`, `optional`) -- List of token ids that are not allowed to be
generated that will be used by default in the :obj:`generate` method of the model. In order to get the
generated that will be used by default in the :obj:`generate` method of the model. In order to get the
tokens of the words that should not appear in the generated text, use
tokens of the words that should not appear in the generated text, use
- **num_return_sequences** (:obj:`int`, `optional`, defaults to 1) -- Number of independently computed
- **num_return_sequences** (:obj:`int`, `optional`, defaults to 1) -- Number of independently computed
returned sequences for each element in the batch that will be used by default in the :obj:`generate`
returned sequences for each element in the batch that will be used by default in the :obj:`generate`
method of the model.
method of the model.
Parameters for fine-tuning tasks
Parameters for fine-tuning tasks
- **architectures** (:obj:`List[str]`, `optional`) -- Model architectures that can be used with the
- **architectures** (:obj:`List[str]`, `optional`) -- Model architectures that can be used with the
model pretrained weights.
model pretrained weights.
- **finetuning_task** (:obj:`str`, `optional`) -- Name of the task used to fine-tune the model. This can be
- **finetuning_task** (:obj:`str`, `optional`) -- Name of the task used to fine-tune the model. This can be
used when converting from an original (TensorFlow or PyTorch) checkpoint.
used when converting from an original (TensorFlow or PyTorch) checkpoint.
- **id2label** (:obj:`List[str]`, `optional`) -- A map from index (for instance prediction index, or target
- **id2label** (:obj:`List[str]`, `optional`) -- A map from index (for instance prediction index, or target
index) to label.
index) to label.
- **label2id** (:obj:`Dict[str, int]`, `optional`) -- A map from label to index for the model.
- **label2id** (:obj:`Dict[str, int]`, `optional`) -- A map from label to index for the model.
- **num_labels** (:obj:`int`, `optional`) -- Number of labels to use in the last layer added to the model,
- **num_labels** (:obj:`int`, `optional`) -- Number of labels to use in the last layer added to the model,
typically for a classification task.
typically for a classification task.
- **task_specific_params** (:obj:`Dict[str, Any]`, `optional`) -- Additional keyword arguments to store for
- **task_specific_params** (:obj:`Dict[str, Any]`, `optional`) -- Additional keyword arguments to store for
the current task.
the current task.
Parameters linked to the tokenizer
Parameters linked to the tokenizer
- **prefix** (:obj:`str`, `optional`) -- A specific prompt that should be added at the beginning of each
- **prefix** (:obj:`str`, `optional`) -- A specific prompt that should be added at the beginning of each
text before calling the model.
text before calling the model.
- **bos_token_id** (:obj:`int`, `optional`)) -- The id of the `beginning-of-stream` token.
- **bos_token_id** (:obj:`int`, `optional`)) -- The id of the `beginning-of-stream` token.
- **pad_token_id** (:obj:`int`, `optional`)) -- The id of the `padding` token.
- **pad_token_id** (:obj:`int`, `optional`)) -- The id of the `padding` token.
- **eos_token_id** (:obj:`int`, `optional`)) -- The id of the `end-of-stream` token.
- **eos_token_id** (:obj:`int`, `optional`)) -- The id of the `end-of-stream` token.
- **decoder_start_token_id** (:obj:`int`, `optional`)) -- If an encoder-decoder model starts decoding with
- **decoder_start_token_id** (:obj:`int`, `optional`)) -- If an encoder-decoder model starts decoding with
a different token than `bos`, the id of that token.
a different token than `bos`, the id of that token.
PyTorch specific parameters
PyTorch specific parameters
- **torchscript** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Whether or not the model should be
- **torchscript** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Whether or not the model should be
used with Torchscript.
used with Torchscript.
- **tie_word_embeddings** (:obj:`bool`, `optional`, defaults to :obj:`True`) -- Whether the model's input and output word embeddings should be tied. Note that this is only relevant if the model has a output word embedding layer.
- **tie_word_embeddings** (:obj:`bool`, `optional`, defaults to :obj:`True`) -- Whether the model's input and output word embeddings should be tied. Note that this is only relevant if the model has a output word embedding layer.
TensorFlow specific parameters
TensorFlow specific parameters
- **use_bfloat16** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Whether or not the model should
- **use_bfloat16** (:obj:`bool`, `optional`, defaults to :obj:`False`) -- Whether or not the model should
use BFloat16 scalars (only used by some TensorFlow models).
use BFloat16 scalars (only used by some TensorFlow models).