Unverified Commit 27b3031d authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Mass conversion of documentation from rst to Markdown (#14866)

* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever
parent 18587639
......@@ -28,123 +28,121 @@ SEW_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SEWConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.SEWModel`. It is used to
This is the configuration class to store the configuration of a [`SEWModel`]. It is used to
instantiate a SEW model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the SEW `asapp/sew-tiny-100k
<https://huggingface.co/asapp/sew-tiny-100k>`__ architecture.
configuration with the defaults will yield a similar configuration to that of the SEW [asapp/sew-tiny-100k](https://huggingface.co/asapp/sew-tiny-100k) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 32):
vocab_size (`int`, *optional*, defaults to 32):
Vocabulary size of the SEW model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.SEW`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
`inputs_ids` passed when calling [`SEW`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
squeeze_factor (:obj:`int`, `optional`, defaults to 2):
squeeze_factor (`int`, *optional*, defaults to 2):
Sequence length downsampling factor after the encoder and upsampling factor after the transformer.
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1):
`"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
final_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the final projection layer of :class:`SEWForCTC`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`SEWForCTC`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
feat_extract_norm (:obj:`str`, `optional`, defaults to :obj:`"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of :obj:`"group"` for group
normalization of only the first 1D convolutional layer or :obj:`"layer"` for layer normalization of all 1D
feat_extract_norm (`str`, *optional*, defaults to `"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of `"group"` for group
normalization of only the first 1D convolutional layer or `"layer"` for layer normalization of all 1D
convolutional layers.
feat_proj_dropout (:obj:`float`, `optional`, defaults to 0.0):
feat_proj_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for output of the feature extractor.
feat_extract_activation (:obj:`str, `optional`, defaults to :obj:`"gelu"`):
feat_extract_activation (`str, `optional`, defaults to `"gelu"`):
The non-linear activation function (function or string) in the 1D convolutional layers of the feature
extractor. If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
conv_dim (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
extractor. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
conv_dim (`Tuple[int]`, *optional*, defaults to `(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the
feature extractor. The length of `conv_dim` defines the number of 1D convolutional layers.
conv_stride (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
feature extractor. The length of *conv_dim* defines the number of 1D convolutional layers.
conv_stride (`Tuple[int]`, *optional*, defaults to `(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length
of `conv_stride` defines the number of convolutional layers and has to match the the length of `conv_dim`.
conv_kernel (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
of *conv_stride* defines the number of convolutional layers and has to match the the length of *conv_dim*.
conv_kernel (`Tuple[int]`, *optional*, defaults to `(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The
length of `conv_kernel` defines the number of convolutional layers and has to match the the length of
`conv_dim`.
conv_bias (:obj:`bool`, `optional`, defaults to :obj:`False`):
length of *conv_kernel* defines the number of convolutional layers and has to match the the length of
*conv_dim*.
conv_bias (`bool`, *optional*, defaults to `False`):
Whether the 1D convolutional layers have a bias.
num_conv_pos_embeddings (:obj:`int`, `optional`, defaults to 128):
num_conv_pos_embeddings (`int`, *optional*, defaults to 128):
Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional
embeddings layer.
num_conv_pos_embedding_groups (:obj:`int`, `optional`, defaults to 16):
num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
Number of groups of 1D convolutional positional embeddings layer.
apply_spec_augment (:obj:`bool`, `optional`, defaults to :obj:`True`):
apply_spec_augment (`bool`, *optional*, defaults to `True`):
Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see
`SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
<https://arxiv.org/abs/1904.08779>`__.
mask_time_prob (:obj:`float`, `optional`, defaults to 0.05):
[SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779).
mask_time_prob (`float`, *optional*, defaults to 0.05):
Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
masked, `mask_time_prob` should be ``prob_vector_start*mask_time_length``. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment is True``.
mask_time_length (:obj:`int`, `optional`, defaults to 10):
masked, *mask_time_prob* should be `prob_vector_start*mask_time_length`. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_time_length (`int`, *optional*, defaults to 10):
Length of vector span along the time axis.
mask_time_min_masks (:obj:`int`, `optional`, defaults to 2),:
The minimum number of masks of length ``mask_feature_length`` generated along the time axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if
mask_time_min_masks (`int`, *optional*, defaults to 2),:
The minimum number of masks of length `mask_feature_length` generated along the time axis, each time
step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks''
mask_feature_prob (:obj:`float`, `optional`, defaults to 0.0):
mask_feature_prob (`float`, *optional*, defaults to 0.0):
Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
span to be masked, `mask_feature_prob` should be ``prob_vector_start*mask_feature_length``. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment
is True``.
mask_feature_length (:obj:`int`, `optional`, defaults to 10):
span to be masked, *mask_feature_prob* should be `prob_vector_start*mask_feature_length`. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_feature_length (`int`, *optional*, defaults to 10):
Length of vector span along the feature axis.
mask_feature_min_masks (:obj:`int`, `optional`, defaults to 0),:
The minimum number of masks of length ``mask_feature_length`` generated along the feature axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if
mask_feature_min_masks (`int`, *optional*, defaults to 0),:
The minimum number of masks of length `mask_feature_length` generated along the feature axis, each time
step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks''
ctc_loss_reduction (:obj:`str`, `optional`, defaults to :obj:`"sum"`):
Specifies the reduction to apply to the output of ``torch.nn.CTCLoss``. Only relevant when training an
instance of :class:`~transformers.SEWForCTC`.
ctc_zero_infinity (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to zero infinite losses and the associated gradients of ``torch.nn.CTCLoss``. Infinite losses
ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
instance of [`SEWForCTC`].
ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
Whether to zero infinite losses and the associated gradients of `torch.nn.CTCLoss`. Infinite losses
mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an
instance of :class:`~transformers.SEWForCTC`.
use_weighted_layer_sum (:obj:`bool`, `optional`, defaults to :obj:`False`):
instance of [`SEWForCTC`].
use_weighted_layer_sum (`bool`, *optional*, defaults to `False`):
Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
instance of :class:`~transformers.Wav2Vec2ForSequenceClassification`.
classifier_proj_size (:obj:`int`, `optional`, defaults to 256):
instance of [`Wav2Vec2ForSequenceClassification`].
classifier_proj_size (`int`, *optional*, defaults to 256):
Dimensionality of the projection before token mean-pooling for classification.
Example::
Example:
>>> from transformers import SEWModel, SEWConfig
```python
>>> from transformers import SEWModel, SEWConfig
>>> # Initializing a SEW asapp/sew-tiny-100k style configuration
>>> configuration = SEWConfig()
>>> # Initializing a SEW asapp/sew-tiny-100k style configuration
>>> configuration = SEWConfig()
>>> # Initializing a model from the asapp/sew-tiny-100k style configuration
>>> model = SEWModel(configuration)
>>> # Initializing a model from the asapp/sew-tiny-100k style configuration
>>> model = SEWModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "sew"
def __init__(
......
......@@ -28,143 +28,141 @@ SEW_D_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SEWDConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.SEWDModel`. It is used to
This is the configuration class to store the configuration of a [`SEWDModel`]. It is used to
instantiate a SEW-D model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the SEW-D `asapp/sew-d-tiny-100k
<https://huggingface.co/asapp/sew-d-tiny-100k>`__ architecture.
configuration with the defaults will yield a similar configuration to that of the SEW-D [asapp/sew-d-tiny-100k](https://huggingface.co/asapp/sew-d-tiny-100k) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 32):
vocab_size (`int`, *optional*, defaults to 32):
Vocabulary size of the SEW-D model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.SEWD`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
`inputs_ids` passed when calling [`SEWD`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
squeeze_factor (:obj:`int`, `optional`, defaults to 2):
squeeze_factor (`int`, *optional*, defaults to 2):
Sequence length downsampling factor after the encoder and upsampling factor after the transformer.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
position_buckets (:obj:`int`, `optional`, defaults to 256):
position_buckets (`int`, *optional*, defaults to 256):
The maximum size of relative position embeddings.
share_att_key (:obj:`bool`, `optional`, defaults to :obj:`True`):
share_att_key (`bool`, *optional*, defaults to `True`):
Whether to share attention key with c2p and p2c.
relative_attention (:obj:`bool`, `optional`, defaults to :obj:`True`):
relative_attention (`bool`, *optional*, defaults to `True`):
Whether to use relative position encoding.
position_biased_input (:obj:`bool`, `optional`, defaults to :obj:`False`):
position_biased_input (`bool`, *optional*, defaults to `False`):
Whether to add absolute position embedding to content embedding.
pos_att_type (:obj:`Tuple[str]`, `optional`, defaults to :obj:`("p2c", "c2p")`):
The type of relative position attention, it can be a combination of :obj:`("p2c", "c2p", "p2p")`, e.g.
:obj:`("p2c")`, :obj:`("p2c", "c2p")`, :obj:`("p2c", "c2p", 'p2p")`.
norm_rel_ebd (:obj:`str`, `optional`, defaults to :obj:`"layer_norm"`):
Whether to use layer norm in relative embedding (:obj:`"layer_norm"` if yes)
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu_python"`):
pos_att_type (`Tuple[str]`, *optional*, defaults to `("p2c", "c2p")`):
The type of relative position attention, it can be a combination of `("p2c", "c2p", "p2p")`, e.g.
`("p2c")`, `("p2c", "c2p")`, `("p2c", "c2p", 'p2p")`.
norm_rel_ebd (`str`, *optional*, defaults to `"layer_norm"`):
Whether to use layer norm in relative embedding (`"layer_norm"` if yes)
hidden_act (`str` or `function`, *optional*, defaults to `"gelu_python"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"`, :obj:`"gelu_python"` and :obj:`"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1):
`"gelu"`, `"relu"`, `"selu"`, `"gelu_python"` and `"gelu_new"` are supported.
hidden_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
final_dropout (:obj:`float`, `optional`, defaults to 0.1):
The dropout probability for the final projection layer of :class:`SEWDForCTC`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of [`SEWDForCTC`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-7):
layer_norm_eps (`float`, *optional*, defaults to 1e-7):
The epsilon used by the layer normalization layers in the transformer encoder.
feature_layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-5):
feature_layer_norm_eps (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization after the feature extractor.
feat_extract_norm (:obj:`str`, `optional`, defaults to :obj:`"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of :obj:`"group"` for group
normalization of only the first 1D convolutional layer or :obj:`"layer"` for layer normalization of all 1D
feat_extract_norm (`str`, *optional*, defaults to `"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of `"group"` for group
normalization of only the first 1D convolutional layer or `"layer"` for layer normalization of all 1D
convolutional layers.
feat_proj_dropout (:obj:`float`, `optional`, defaults to 0.0):
feat_proj_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for output of the feature extractor.
feat_extract_activation (:obj:`str, `optional`, defaults to :obj:`"gelu"`):
feat_extract_activation (`str, `optional`, defaults to `"gelu"`):
The non-linear activation function (function or string) in the 1D convolutional layers of the feature
extractor. If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
conv_dim (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
extractor. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
conv_dim (`Tuple[int]`, *optional*, defaults to `(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the
feature extractor. The length of `conv_dim` defines the number of 1D convolutional layers.
conv_stride (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
feature extractor. The length of *conv_dim* defines the number of 1D convolutional layers.
conv_stride (`Tuple[int]`, *optional*, defaults to `(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length
of `conv_stride` defines the number of convolutional layers and has to match the the length of `conv_dim`.
conv_kernel (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
of *conv_stride* defines the number of convolutional layers and has to match the the length of *conv_dim*.
conv_kernel (`Tuple[int]`, *optional*, defaults to `(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The
length of `conv_kernel` defines the number of convolutional layers and has to match the the length of
`conv_dim`.
conv_bias (:obj:`bool`, `optional`, defaults to :obj:`False`):
length of *conv_kernel* defines the number of convolutional layers and has to match the the length of
*conv_dim*.
conv_bias (`bool`, *optional*, defaults to `False`):
Whether the 1D convolutional layers have a bias.
num_conv_pos_embeddings (:obj:`int`, `optional`, defaults to 128):
num_conv_pos_embeddings (`int`, *optional*, defaults to 128):
Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional
embeddings layer.
num_conv_pos_embedding_groups (:obj:`int`, `optional`, defaults to 16):
num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
Number of groups of 1D convolutional positional embeddings layer.
apply_spec_augment (:obj:`bool`, `optional`, defaults to :obj:`True`):
apply_spec_augment (`bool`, *optional*, defaults to `True`):
Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see
`SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
<https://arxiv.org/abs/1904.08779>`__.
mask_time_prob (:obj:`float`, `optional`, defaults to 0.05):
[SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779).
mask_time_prob (`float`, *optional*, defaults to 0.05):
Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
masked, `mask_time_prob` should be ``prob_vector_start*mask_time_length``. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment is True``.
mask_time_length (:obj:`int`, `optional`, defaults to 10):
masked, *mask_time_prob* should be `prob_vector_start*mask_time_length`. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_time_length (`int`, *optional*, defaults to 10):
Length of vector span along the time axis.
mask_time_min_masks (:obj:`int`, `optional`, defaults to 2),:
The minimum number of masks of length ``mask_feature_length`` generated along the time axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if
mask_time_min_masks (`int`, *optional*, defaults to 2),:
The minimum number of masks of length `mask_feature_length` generated along the time axis, each time
step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks''
mask_feature_prob (:obj:`float`, `optional`, defaults to 0.0):
mask_feature_prob (`float`, *optional*, defaults to 0.0):
Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
span to be masked, `mask_feature_prob` should be ``prob_vector_start*mask_feature_length``. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment
is True``.
mask_feature_length (:obj:`int`, `optional`, defaults to 10):
span to be masked, *mask_feature_prob* should be `prob_vector_start*mask_feature_length`. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_feature_length (`int`, *optional*, defaults to 10):
Length of vector span along the feature axis.
mask_feature_min_masks (:obj:`int`, `optional`, defaults to 0),:
The minimum number of masks of length ``mask_feature_length`` generated along the feature axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if
mask_feature_min_masks (`int`, *optional*, defaults to 0),:
The minimum number of masks of length `mask_feature_length` generated along the feature axis, each time
step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks''
diversity_loss_weight (:obj:`int`, `optional`, defaults to 0.1):
diversity_loss_weight (`int`, *optional*, defaults to 0.1):
The weight of the codebook diversity loss component.
ctc_loss_reduction (:obj:`str`, `optional`, defaults to :obj:`"sum"`):
Specifies the reduction to apply to the output of ``torch.nn.CTCLoss``. Only relevant when training an
instance of :class:`~transformers.SEWDForCTC`.
ctc_zero_infinity (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to zero infinite losses and the associated gradients of ``torch.nn.CTCLoss``. Infinite losses
ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
instance of [`SEWDForCTC`].
ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
Whether to zero infinite losses and the associated gradients of `torch.nn.CTCLoss`. Infinite losses
mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an
instance of :class:`~transformers.SEWDForCTC`.
use_weighted_layer_sum (:obj:`bool`, `optional`, defaults to :obj:`False`):
instance of [`SEWDForCTC`].
use_weighted_layer_sum (`bool`, *optional*, defaults to `False`):
Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
instance of :class:`~transformers.Wav2Vec2ForSequenceClassification`.
classifier_proj_size (:obj:`int`, `optional`, defaults to 256):
instance of [`Wav2Vec2ForSequenceClassification`].
classifier_proj_size (`int`, *optional*, defaults to 256):
Dimensionality of the projection before token mean-pooling for classification.
Example::
Example:
>>> from transformers import SEWDModel, SEWDConfig
```python
>>> from transformers import SEWDModel, SEWDConfig
>>> # Initializing a SEW-D asapp/sew-d-tiny-100k style configuration
>>> configuration = SEWDConfig()
>>> # Initializing a SEW-D asapp/sew-d-tiny-100k style configuration
>>> configuration = SEWDConfig()
>>> # Initializing a model from the asapp/sew-d-tiny-100k style configuration
>>> model = SEWDModel(configuration)
>>> # Initializing a model from the asapp/sew-d-tiny-100k style configuration
>>> model = SEWDModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "sew-d"
def __init__(
......
......@@ -26,49 +26,50 @@ logger = logging.get_logger(__name__)
class SpeechEncoderDecoderConfig(PretrainedConfig):
r"""
:class:`~transformers.SpeechEncoderDecoderConfig` is the configuration class to store the configuration of a
:class:`~transformers.SpeechEncoderDecoderModel`. It is used to instantiate an Encoder Decoder model according to
[`SpeechEncoderDecoderConfig`] is the configuration class to store the configuration of a
[`SpeechEncoderDecoderModel`]. It is used to instantiate an Encoder Decoder model according to
the specified arguments, defining the encoder and decoder configs.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
kwargs (`optional`):
kwargs (*optional*):
Dictionary of keyword arguments. Notably:
- **encoder** (:class:`~transformers.PretrainedConfig`, `optional`) -- An instance of a configuration
- **encoder** ([`PretrainedConfig`], *optional*) -- An instance of a configuration
object that defines the encoder config.
- **decoder** (:class:`~transformers.PretrainedConfig`, `optional`) -- An instance of a configuration
- **decoder** ([`PretrainedConfig`], *optional*) -- An instance of a configuration
object that defines the decoder config.
Examples::
Examples:
>>> from transformers import BertConfig, Wav2Vec2Config, SpeechEncoderDecoderConfig, SpeechEncoderDecoderModel
```python
>>> from transformers import BertConfig, Wav2Vec2Config, SpeechEncoderDecoderConfig, SpeechEncoderDecoderModel
>>> # Initializing a Wav2Vec2 & BERT style configuration
>>> config_encoder = Wav2Vec2Config()
>>> config_decoder = BertConfig()
>>> # Initializing a Wav2Vec2 & BERT style configuration
>>> config_encoder = Wav2Vec2Config()
>>> config_decoder = BertConfig()
>>> config = SpeechEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> config = SpeechEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & bert-base-uncased style configurations
>>> model = SpeechEncoderDecoderModel(config=config)
>>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & bert-base-uncased style configurations
>>> model = SpeechEncoderDecoderModel(config=config)
>>> # Accessing the model configuration
>>> config_encoder = model.config.encoder
>>> config_decoder = model.config.decoder
>>> # set decoder config to causal lm
>>> config_decoder.is_decoder = True
>>> config_decoder.add_cross_attention = True
>>> # Accessing the model configuration
>>> config_encoder = model.config.encoder
>>> config_decoder = model.config.decoder
>>> # set decoder config to causal lm
>>> config_decoder.is_decoder = True
>>> config_decoder.add_cross_attention = True
>>> # Saving the model, including its configuration
>>> model.save_pretrained('my-model')
>>> # Saving the model, including its configuration
>>> model.save_pretrained('my-model')
>>> # loading model and config from pretrained folder
>>> encoder_decoder_config = SpeechEncoderDecoderConfig.from_pretrained('my-model')
>>> model = SpeechEncoderDecoderModel.from_pretrained('my-model', config=encoder_decoder_config)
"""
>>> # loading model and config from pretrained folder
>>> encoder_decoder_config = SpeechEncoderDecoderConfig.from_pretrained('my-model')
>>> model = SpeechEncoderDecoderModel.from_pretrained('my-model', config=encoder_decoder_config)
```"""
model_type = "speech-encoder-decoder"
is_composition = True
......@@ -93,11 +94,11 @@ class SpeechEncoderDecoderConfig(PretrainedConfig):
cls, encoder_config: PretrainedConfig, decoder_config: PretrainedConfig, **kwargs
) -> PretrainedConfig:
r"""
Instantiate a :class:`~transformers.SpeechEncoderDecoderConfig` (or a derived class) from a pre-trained encoder
Instantiate a [`SpeechEncoderDecoderConfig`] (or a derived class) from a pre-trained encoder
model configuration and decoder model configuration.
Returns:
:class:`SpeechEncoderDecoderConfig`: An instance of a configuration object
[`SpeechEncoderDecoderConfig`]: An instance of a configuration object
"""
logger.info("Setting `config.is_decoder=True` and `config.add_cross_attention=True` for decoder_config")
decoder_config.is_decoder = True
......@@ -107,10 +108,10 @@ class SpeechEncoderDecoderConfig(PretrainedConfig):
def to_dict(self):
"""
Serializes this instance to a Python dictionary. Override the default `to_dict()` from `PretrainedConfig`.
Serializes this instance to a Python dictionary. Override the default *to_dict()* from *PretrainedConfig*.
Returns:
:obj:`Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
`Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
"""
output = copy.deepcopy(self.__dict__)
output["encoder"] = self.encoder.to_dict()
......
......@@ -435,26 +435,26 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
r"""
Returns:
Examples::
Examples:
>>> from transformers import SpeechEncoderDecoderModel, Speech2Text2Processor
>>> from datasets import load_dataset
>>> import torch
>>> processor = Speech2Text2Processor.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> model = SpeechEncoderDecoderModel.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
```python
>>> from transformers import SpeechEncoderDecoderModel, Speech2Text2Processor
>>> from datasets import load_dataset
>>> import torch
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> processor = Speech2Text2Processor.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> model = SpeechEncoderDecoderModel.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values
>>> decoder_input_ids = torch.tensor([[model.config.decoder.decoder_start_token_id]])
>>> outputs = model(input_values=input_values, decoder_input_ids=decoder_input_ids)
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> # inference (generation)
>>> generated = model.generate(input_values)
>>> translation = processor.batch_decode(generated)
>>> input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values
>>> decoder_input_ids = torch.tensor([[model.config.decoder.decoder_start_token_id]])
>>> outputs = model(input_values=input_values, decoder_input_ids=decoder_input_ids)
"""
>>> # inference (generation)
>>> generated = model.generate(input_values)
>>> translation = processor.batch_decode(generated)
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
kwargs_encoder = {argument: value for argument, value in kwargs.items() if not argument.startswith("decoder_")}
......
......@@ -28,86 +28,87 @@ SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class Speech2TextConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.Speech2TextModel`. It is used
This is the configuration class to store the configuration of a [`Speech2TextModel`]. It is used
to instantiate an Speech2Text model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text
`facebook/s2t-small-librispeech-asr <https://huggingface.co/facebook/s2t-small-librispeech-asr>`__ architecture.
[facebook/s2t-small-librispeech-asr](https://huggingface.co/facebook/s2t-small-librispeech-asr) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 50265):
vocab_size (`int`, *optional*, defaults to 50265):
Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.Speech2TextModel`
d_model (:obj:`int`, `optional`, defaults to 1024):
the `inputs_ids` passed when calling [`Speech2TextModel`]
d_model (`int`, *optional*, defaults to 1024):
Dimensionality of the layers and the pooler layer.
encoder_layers (:obj:`int`, `optional`, defaults to 12):
encoder_layers (`int`, *optional*, defaults to 12):
Number of encoder layers.
decoder_layers (:obj:`int`, `optional`, defaults to 12):
decoder_layers (`int`, *optional*, defaults to 12):
Number of decoder layers.
encoder_attention_heads (:obj:`int`, `optional`, defaults to 16):
encoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer encoder.
decoder_attention_heads (:obj:`int`, `optional`, defaults to 16):
decoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder.
decoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096):
decoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
encoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096):
encoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1):
`"gelu"`, `"relu"`, `"silu"` and `"gelu_new"` are supported.
dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer.
classifier_dropout (:obj:`float`, `optional`, defaults to 0.0):
classifier_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for classifier.
init_std (:obj:`float`, `optional`, defaults to 0.02):
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0):
The LayerDrop probability for the encoder. See the `LayerDrop paper <see
https://arxiv.org/abs/1909.11556>`__ for more details.
decoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0):
The LayerDrop probability for the decoder. See the `LayerDrop paper <see
https://arxiv.org/abs/1909.11556>`__ for more details.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
encoder_layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details.
decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
max_source_positions (:obj:`int`, `optional`, defaults to 6000):
max_source_positions (`int`, *optional*, defaults to 6000):
The maximum sequence length of log-mel filter-bank features that this model might ever be used with.
max_target_positions (:obj:`int`, `optional`, defaults to 1024):
max_target_positions (`int`, *optional*, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
num_conv_layers (:obj:`int`, `optional`, defaults to 2):
num_conv_layers (`int`, *optional*, defaults to 2):
Number of 1D convolutional layers in the conv module.
conv_kernel_sizes (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 5)`):
conv_kernel_sizes (`Tuple[int]`, *optional*, defaults to `(5, 5)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the conv module. The length
of :obj:`conv_kernel_sizes` has to match :obj:`num_conv_layers`.
conv_channels (:obj:`int`, `optional`, defaults to 1024):
of `conv_kernel_sizes` has to match `num_conv_layers`.
conv_channels (`int`, *optional*, defaults to 1024):
An integer defining the number of output channels of each convolution layers except the final one in the
conv module.
input_feat_per_channel (:obj:`int`, `optional`, defaults to 80):
input_feat_per_channel (`int`, *optional*, defaults to 80):
An integer specifying the size of feature vector. This is also the dimensions of log-mel filter-bank
features.
input_channels (:obj:`int`, `optional`, defaults to 1):
input_channels (`int`, *optional*, defaults to 1):
An integer specifying number of input channels of the input feature vector.
Example::
Example:
>>> from transformers import Speech2TextModel, Speech2TextConfig
```python
>>> from transformers import Speech2TextModel, Speech2TextConfig
>>> # Initializing a Speech2Text s2t_transformer_s style configuration
>>> configuration = Speech2TextConfig()
>>> # Initializing a Speech2Text s2t_transformer_s style configuration
>>> configuration = Speech2TextConfig()
>>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2TextModel(configuration)
>>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2TextModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "speech_to_text"
keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {"num_attention_heads": "encoder_attention_heads", "hidden_size": "d_model"}
......
......@@ -35,26 +35,26 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor):
r"""
Constructs a Speech2Text feature extractor.
This feature extractor inherits from :class:`~transformers.Speech2TextFeatureExtractor` which contains most of the
This feature extractor inherits from [`Speech2TextFeatureExtractor`] which contains most of the
main methods. Users should refer to this superclass for more information regarding those methods.
This class extracts mel-filter bank features from raw speech using TorchAudio and applies utterance-level cepstral
mean and variance normalization to the extracted features.
Args:
feature_size (:obj:`int`, defaults to 80):
feature_size (`int`, defaults to 80):
The feature dimension of the extracted features.
sampling_rate (:obj:`int`, defaults to 16000):
sampling_rate (`int`, defaults to 16000):
The sampling rate at which the audio files should be digitalized expressed in Hertz per second (Hz).
num_mel_bins (:obj:`int`, defaults to 80):
num_mel_bins (`int`, defaults to 80):
Number of Mel-frequency bins.
padding_value (:obj:`float`, defaults to 0.0):
padding_value (`float`, defaults to 0.0):
The value that is used to fill the padding vectors.
do_ceptral_normalize (:obj:`bool`, `optional`, defaults to :obj:`True`):
do_ceptral_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to apply utterance-level cepstral mean and variance normalization to extracted features.
normalize_means (:obj:`bool`, `optional`, defaults to :obj:`True`):
normalize_means (`bool`, *optional*, defaults to `True`):
Whether or not to zero-mean normalize the extracted features.
normalize_vars (:obj:`bool`, `optional`, defaults to :obj:`True`):
normalize_vars (`bool`, *optional*, defaults to `True`):
Whether or not to unit-variance normalize the extracted features.
"""
......@@ -140,49 +140,51 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor):
Main method to featurize and prepare for the model one or several sequence(s). sequences.
Args:
raw_speech (:obj:`np.ndarray`, :obj:`List[float]`, :obj:`List[np.ndarray]`, :obj:`List[List[float]]`):
raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`):
The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float
values, a list of numpy arrays or a list of list of float values.
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`):
padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `True`):
Select a strategy to pad the returned sequences (according to the model's padding side and padding
index) among:
* :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a
single sequence if provided).
* :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
* :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
max_length (:obj:`int`, `optional`):
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
truncation (:obj:`bool`):
Activates truncation to cut input sequences longer than `max_length` to `max_length`.
pad_to_multiple_of (:obj:`int`, `optional`):
truncation (`bool`):
Activates truncation to cut input sequences longer than *max_length* to *max_length*.
pad_to_multiple_of (`int`, *optional*):
If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask (:obj:`bool`, `optional`):
return_attention_mask (`bool`, *optional*):
Whether to return the attention mask. If left to the default, will return the attention mask according
to the specific feature_extractor's default.
`What are attention masks? <../glossary.html#attention-mask>`__
[What are attention masks?](../glossary#attention-mask)
.. note::
<Tip>
For Speech2TextTransoformer models, :obj:`attention_mask` should alwys be passed for batched
inference, to avoid subtle bugs.
For Speech2TextTransoformer models, `attention_mask` should alwys be passed for batched
inference, to avoid subtle bugs.
return_tensors (:obj:`str` or :class:`~transformers.file_utils.TensorType`, `optional`):
</Tip>
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
If set, will return tensors instead of list of python integers. Acceptable values are:
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects.
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects.
* :obj:`'np'`: Return Numpy :obj:`np.ndarray` objects.
sampling_rate (:obj:`int`, `optional`):
The sampling rate at which the :obj:`raw_speech` input was sampled. It is strongly recommended to pass
:obj:`sampling_rate` at the forward call to prevent silent errors.
padding_value (:obj:`float`, defaults to 0.0):
- `'tf'`: Return TensorFlow `tf.constant` objects.
- `'pt'`: Return PyTorch `torch.Tensor` objects.
- `'np'`: Return Numpy `np.ndarray` objects.
sampling_rate (`int`, *optional*):
The sampling rate at which the `raw_speech` input was sampled. It is strongly recommended to pass
`sampling_rate` at the forward call to prevent silent errors.
padding_value (`float`, defaults to 0.0):
The value that is used to fill the padding values / vectors.
"""
......
......@@ -26,17 +26,17 @@ class Speech2TextProcessor:
Constructs a Speech2Text processor which wraps a Speech2Text feature extractor and a Speech2Text tokenizer into a
single processor.
:class:`~transformers.Speech2TextProcessor` offers all the functionalities of
:class:`~transformers.Speech2TextFeatureExtractor` and :class:`~transformers.Speech2TextTokenizer`. See the
:meth:`~transformers.Speech2TextProcessor.__call__` and :meth:`~transformers.Speech2TextProcessor.decode` for more
[`Speech2TextProcessor`] offers all the functionalities of
[`Speech2TextFeatureExtractor`] and [`Speech2TextTokenizer`]. See the
[`~Speech2TextProcessor.__call__`] and [`~Speech2TextProcessor.decode`] for more
information.
Args:
feature_extractor (:obj:`Speech2TextFeatureExtractor`):
An instance of :class:`~transformers.Speech2TextFeatureExtractor`. The feature extractor is a required
feature_extractor (`Speech2TextFeatureExtractor`):
An instance of [`Speech2TextFeatureExtractor`]. The feature extractor is a required
input.
tokenizer (:obj:`Speech2TextTokenizer`):
An instance of :class:`~transformers.Speech2TextTokenizer`. The tokenizer is a required input.
tokenizer (`Speech2TextTokenizer`):
An instance of [`Speech2TextTokenizer`]. The tokenizer is a required input.
"""
def __init__(self, feature_extractor, tokenizer):
......@@ -56,17 +56,19 @@ class Speech2TextProcessor:
def save_pretrained(self, save_directory):
"""
Save a Speech2Text feature extractor object and Speech2Text tokenizer object to the directory
``save_directory``, so that it can be re-loaded using the
:func:`~transformers.Speech2TextProcessor.from_pretrained` class method.
`save_directory`, so that it can be re-loaded using the
[`~Speech2TextProcessor.from_pretrained`] class method.
.. note::
<Tip>
This class method is simply calling :meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` and
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.save_pretrained`. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling [`~PreTrainedFeatureExtractor.save_pretrained`] and
[`~tokenization_utils_base.PreTrainedTokenizer.save_pretrained`]. Please refer to the
docstrings of the methods above for more information.
</Tip>
Args:
save_directory (:obj:`str` or :obj:`os.PathLike`):
save_directory (`str` or `os.PathLike`):
Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will
be created if it does not exist).
"""
......@@ -77,30 +79,32 @@ class Speech2TextProcessor:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r"""
Instantiate a :class:`~transformers.Speech2TextProcessor` from a pretrained Speech2Text processor.
Instantiate a [`Speech2TextProcessor`] from a pretrained Speech2Text processor.
<Tip>
.. note::
This class method is simply calling Speech2TextFeatureExtractor's
[`~PreTrainedFeatureExtractor.from_pretrained`] and Speech2TextTokenizer's
[`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`]. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling Speech2TextFeatureExtractor's
:meth:`~transformers.PreTrainedFeatureExtractor.from_pretrained` and Speech2TextTokenizer's
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.from_pretrained`. Please refer to the
docstrings of the methods above for more information.
</Tip>
Args:
pretrained_model_name_or_path (:obj:`str` or :obj:`os.PathLike`):
pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either:
- a string, the `model id` of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like ``bert-base-uncased``, or
namespaced under a user or organization name, like ``dbmdz/bert-base-german-cased``.
- a path to a `directory` containing a feature extractor file saved using the
:meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` method, e.g.,
``./my_model_directory/``.
- a path or url to a saved feature extractor JSON `file`, e.g.,
``./my_model_directory/preprocessor_config.json``.
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a feature extractor file saved using the
[`~PreTrainedFeatureExtractor.save_pretrained`] method, e.g.,
`./my_model_directory/`.
- a path or url to a saved feature extractor JSON *file*, e.g.,
`./my_model_directory/preprocessor_config.json`.
**kwargs
Additional keyword arguments passed along to both :class:`~transformers.PreTrainedFeatureExtractor` and
:class:`~transformers.PreTrainedTokenizer`
Additional keyword arguments passed along to both [`PreTrainedFeatureExtractor`] and
[`PreTrainedTokenizer`]
"""
feature_extractor = Speech2TextFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs)
tokenizer = Speech2TextTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
......@@ -110,9 +114,9 @@ class Speech2TextProcessor:
def __call__(self, *args, **kwargs):
"""
When used in normal mode, this method forwards all its arguments to Speech2TextFeatureExtractor's
:meth:`~transformers.Speech2TextFeatureExtractor.__call__` and returns its output. If used in the context
:meth:`~transformers.Speech2TextProcessor.as_target_processor` this method forwards all its arguments to
Speech2TextTokenizer's :meth:`~transformers.Speech2TextTokenizer.__call__`. Please refer to the doctsring of
[`~Speech2TextFeatureExtractor.__call__`] and returns its output. If used in the context
[`~Speech2TextProcessor.as_target_processor`] this method forwards all its arguments to
Speech2TextTokenizer's [`~Speech2TextTokenizer.__call__`]. Please refer to the doctsring of
the above two methods for more information.
"""
return self.current_processor(*args, **kwargs)
......@@ -120,7 +124,7 @@ class Speech2TextProcessor:
def batch_decode(self, *args, **kwargs):
"""
This method forwards all its arguments to Speech2TextTokenizer's
:meth:`~transformers.PreTrainedTokenizer.batch_decode`. Please refer to the docstring of this method for more
[`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more
information.
"""
return self.tokenizer.batch_decode(*args, **kwargs)
......@@ -128,7 +132,7 @@ class Speech2TextProcessor:
def decode(self, *args, **kwargs):
"""
This method forwards all its arguments to Speech2TextTokenizer's
:meth:`~transformers.PreTrainedTokenizer.decode`. Please refer to the docstring of this method for more
[`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more
information.
"""
return self.tokenizer.decode(*args, **kwargs)
......
......@@ -56,46 +56,45 @@ class Speech2TextTokenizer(PreTrainedTokenizer):
"""
Construct an Speech2Text tokenizer.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains some of the main methods.
This tokenizer inherits from [`PreTrainedTokenizer`] which contains some of the main methods.
Users should refer to the superclass for more information regarding such methods.
Args:
vocab_file (:obj:`str`):
vocab_file (`str`):
File containing the vocabulary.
spm_file (:obj:`str`):
Path to the `SentencePiece <https://github.com/google/sentencepiece>`__ model file
bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
spm_file (`str`):
Path to the [SentencePiece](https://github.com/google/sentencepiece) model file
bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sentence token.
eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sentence token.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths.
do_upper_case (:obj:`bool`, `optional`, defaults to :obj:`False`):
do_upper_case (`bool`, *optional*, defaults to `False`):
Whether or not to uppercase the output when decoding.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`False`):
do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing.
tgt_lang (:obj:`str`, `optional`):
tgt_lang (`str`, *optional*):
A string representing the target language.
sp_model_kwargs (:obj:`dict`, `optional`):
Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece
<https://github.com/google/sentencepiece/tree/master/python>`__ can be used, among other things, to set:
sp_model_kwargs (`dict`, *optional*):
Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set:
- ``enable_sampling``: Enable subword regularization.
- ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout.
- `enable_sampling`: Enable subword regularization.
- `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.
- ``nbest_size = {0,1}``: No sampling is performed.
- ``nbest_size > 1``: samples from the nbest_size results.
- ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
- `nbest_size = {0,1}`: No sampling is performed.
- `nbest_size > 1`: samples from the nbest_size results.
- `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
using forward-filtering-and-backward-sampling algorithm.
- ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
- `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
BPE-dropout.
**kwargs
Additional keyword arguments passed along to :class:`~transformers.PreTrainedTokenizer`
Additional keyword arguments passed along to [`PreTrainedTokenizer`]
"""
vocab_files_names = VOCAB_FILES_NAMES
......@@ -203,18 +202,18 @@ class Speech2TextTokenizer(PreTrainedTokenizer):
) -> List[int]:
"""
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer ``prepare_for_model`` method.
special tokens using the tokenizer `prepare_for_model` method.
Args:
token_ids_0 (:obj:`List[int]`):
token_ids_0 (`List[int]`):
List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`):
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""
if already_has_special_tokens:
......
......@@ -28,65 +28,65 @@ SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class Speech2Text2Config(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.Speech2Text2ForCausalLM`. It
This is the configuration class to store the configuration of a [`Speech2Text2ForCausalLM`]. It
is used to instantiate an Speech2Text2 model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text2
`facebook/s2t-small-librispeech-asr <https://huggingface.co/facebook/s2t-small-librispeech-asr>`__ architecture.
[facebook/s2t-small-librispeech-asr](https://huggingface.co/facebook/s2t-small-librispeech-asr) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 50265):
vocab_size (`int`, *optional*, defaults to 50265):
Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.Speech2TextModel`
d_model (:obj:`int`, `optional`, defaults to 1024):
the `inputs_ids` passed when calling [`Speech2TextModel`]
d_model (`int`, *optional*, defaults to 1024):
Dimensionality of the layers and the pooler layer.
decoder_layers (:obj:`int`, `optional`, defaults to 12):
decoder_layers (`int`, *optional*, defaults to 12):
Number of decoder layers.
decoder_attention_heads (:obj:`int`, `optional`, defaults to 16):
decoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder.
decoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096):
decoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
The non-linear activation function (function or string) in the pooler. If string, :obj:`"gelu"`,
:obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1):
activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer.
classifier_dropout (:obj:`float`, `optional`, defaults to 0.0):
classifier_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for classifier.
init_std (:obj:`float`, `optional`, defaults to 0.02):
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
https://arxiv.org/abs/1909.11556>`__ for more details.
decoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0):
The LayerDrop probability for the decoder. See the `LayerDrop paper <see
https://arxiv.org/abs/1909.11556>`__ for more details.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
https://arxiv.org/abs/1909.11556>`__ for more details. decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556) for more details.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
max_source_positions (:obj:`int`, `optional`, defaults to 6000):
max_source_positions (`int`, *optional*, defaults to 6000):
The maximum sequence length of log-mel filter-bank features that this model might ever be used with.
max_target_positions: (:obj:`int`, `optional`, defaults to 1024):
max_target_positions: (`int`, *optional*, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
Example::
Example:
>>> from transformers import Speech2Text2ForCausalLM, Speech2Text2Config
```python
>>> from transformers import Speech2Text2ForCausalLM, Speech2Text2Config
>>> # Initializing a Speech2Text2 s2t_transformer_s style configuration
>>> configuration = Speech2Text2Config()
>>> # Initializing a Speech2Text2 s2t_transformer_s style configuration
>>> configuration = Speech2Text2Config()
>>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2Text2ForCausalLM(configuration)
>>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2Text2ForCausalLM(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "speech_to_text_2"
keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {"num_attention_heads": "decoder_attention_heads", "hidden_size": "d_model"}
......
......@@ -27,16 +27,16 @@ class Speech2Text2Processor:
Constructs a Speech2Text2 processor which wraps a Speech2Text2 feature extractor and a Speech2Text2 tokenizer into
a single processor.
:class:`~transformers.Speech2Text2Processor` offers all the functionalities of
:class:`~transformers.AutoFeatureExtractor` and :class:`~transformers.Speech2Text2Tokenizer`. See the
:meth:`~transformers.Speech2Text2Processor.__call__` and :meth:`~transformers.Speech2Text2Processor.decode` for
[`Speech2Text2Processor`] offers all the functionalities of
[`AutoFeatureExtractor`] and [`Speech2Text2Tokenizer`]. See the
[`~Speech2Text2Processor.__call__`] and [`~Speech2Text2Processor.decode`] for
more information.
Args:
feature_extractor (:obj:`AutoFeatureExtractor`):
An instance of :class:`~transformers.AutoFeatureExtractor`. The feature extractor is a required input.
tokenizer (:obj:`Speech2Text2Tokenizer`):
An instance of :class:`~transformers.Speech2Text2Tokenizer`. The tokenizer is a required input.
feature_extractor (`AutoFeatureExtractor`):
An instance of [`AutoFeatureExtractor`]. The feature extractor is a required input.
tokenizer (`Speech2Text2Tokenizer`):
An instance of [`Speech2Text2Tokenizer`]. The tokenizer is a required input.
"""
def __init__(self, feature_extractor, tokenizer):
......@@ -56,17 +56,19 @@ class Speech2Text2Processor:
def save_pretrained(self, save_directory):
"""
Save a Speech2Text2 feature extractor object and Speech2Text2 tokenizer object to the directory
``save_directory``, so that it can be re-loaded using the
:func:`~transformers.Speech2Text2Processor.from_pretrained` class method.
`save_directory`, so that it can be re-loaded using the
[`~Speech2Text2Processor.from_pretrained`] class method.
.. note::
<Tip>
This class method is simply calling :meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` and
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.save_pretrained`. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling [`~PreTrainedFeatureExtractor.save_pretrained`] and
[`~tokenization_utils_base.PreTrainedTokenizer.save_pretrained`]. Please refer to the
docstrings of the methods above for more information.
</Tip>
Args:
save_directory (:obj:`str` or :obj:`os.PathLike`):
save_directory (`str` or `os.PathLike`):
Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will
be created if it does not exist).
"""
......@@ -77,30 +79,32 @@ class Speech2Text2Processor:
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r"""
Instantiate a :class:`~transformers.Speech2Text2Processor` from a pretrained Speech2Text2 processor.
Instantiate a [`Speech2Text2Processor`] from a pretrained Speech2Text2 processor.
<Tip>
.. note::
This class method is simply calling AutoFeatureExtractor's
[`~PreTrainedFeatureExtractor.from_pretrained`] and Speech2Text2Tokenizer's
[`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`]. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling AutoFeatureExtractor's
:meth:`~transformers.PreTrainedFeatureExtractor.from_pretrained` and Speech2Text2Tokenizer's
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.from_pretrained`. Please refer to the
docstrings of the methods above for more information.
</Tip>
Args:
pretrained_model_name_or_path (:obj:`str` or :obj:`os.PathLike`):
pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either:
- a string, the `model id` of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like ``bert-base-uncased``, or
namespaced under a user or organization name, like ``dbmdz/bert-base-german-cased``.
- a path to a `directory` containing a feature extractor file saved using the
:meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` method, e.g.,
``./my_model_directory/``.
- a path or url to a saved feature extractor JSON `file`, e.g.,
``./my_model_directory/preprocessor_config.json``.
- a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a feature extractor file saved using the
[`~PreTrainedFeatureExtractor.save_pretrained`] method, e.g.,
`./my_model_directory/`.
- a path or url to a saved feature extractor JSON *file*, e.g.,
`./my_model_directory/preprocessor_config.json`.
**kwargs
Additional keyword arguments passed along to both :class:`~transformers.PreTrainedFeatureExtractor` and
:class:`~transformers.PreTrainedTokenizer`
Additional keyword arguments passed along to both [`PreTrainedFeatureExtractor`] and
[`PreTrainedTokenizer`]
"""
feature_extractor = AutoFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs)
tokenizer = Speech2Text2Tokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
......@@ -110,9 +114,9 @@ class Speech2Text2Processor:
def __call__(self, *args, **kwargs):
"""
When used in normal mode, this method forwards all its arguments to AutoFeatureExtractor's
:meth:`~transformers.AutoFeatureExtractor.__call__` and returns its output. If used in the context
:meth:`~transformers.Speech2Text2Processor.as_target_processor` this method forwards all its arguments to
Speech2Text2Tokenizer's :meth:`~transformers.Speech2Text2Tokenizer.__call__`. Please refer to the doctsring of
[`~AutoFeatureExtractor.__call__`] and returns its output. If used in the context
[`~Speech2Text2Processor.as_target_processor`] this method forwards all its arguments to
Speech2Text2Tokenizer's [`~Speech2Text2Tokenizer.__call__`]. Please refer to the doctsring of
the above two methods for more information.
"""
return self.current_processor(*args, **kwargs)
......@@ -120,7 +124,7 @@ class Speech2Text2Processor:
def batch_decode(self, *args, **kwargs):
"""
This method forwards all its arguments to Speech2Text2Tokenizer's
:meth:`~transformers.PreTrainedTokenizer.batch_decode`. Please refer to the docstring of this method for more
[`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more
information.
"""
return self.tokenizer.batch_decode(*args, **kwargs)
......@@ -128,7 +132,7 @@ class Speech2Text2Processor:
def decode(self, *args, **kwargs):
"""
This method forwards all its arguments to Speech2Text2Tokenizer's
:meth:`~transformers.PreTrainedTokenizer.decode`. Please refer to the docstring of this method for more
[`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more
information.
"""
return self.tokenizer.decode(*args, **kwargs)
......
......@@ -68,24 +68,24 @@ class Speech2Text2Tokenizer(PreTrainedTokenizer):
"""
Constructs a Speech2Text2Tokenizer.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains some of the main methods.
This tokenizer inherits from [`PreTrainedTokenizer`] which contains some of the main methods.
Users should refer to the superclass for more information regarding such methods.
Args:
vocab_file (:obj:`str`):
vocab_file (`str`):
File containing the vocabulary.
bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sentence token.
eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sentence token.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths.
**kwargs
Additional keyword arguments passed along to :class:`~transformers.PreTrainedTokenizer`
Additional keyword arguments passed along to [`PreTrainedTokenizer`]
"""
vocab_files_names = VOCAB_FILES_NAMES
......
......@@ -31,62 +31,62 @@ SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SplinterConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.SplinterModel`. It is used to
This is the configuration class to store the configuration of a [`SplinterModel`]. It is used to
instantiate an Splinter model according to the specified arguments, defining the model architecture. Instantiating
a configuration with the defaults will yield a similar configuration to that of the Splinter `tau/splinter-base
<https://huggingface.co/tau/splinter-base>`__ architecture.
a configuration with the defaults will yield a similar configuration to that of the Splinter [tau/splinter-base](https://huggingface.co/tau/splinter-base) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the Splinter model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.SplinterModel`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
the `inputs_ids` passed when calling [`SplinterModel`].
hidden_size (`int`, *optional*, defaults to 768):
Dimension of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
intermediate_size (`int`, *optional*, defaults to 3072):
Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`):
hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
`"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.SplinterModel`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the `token_type_ids` passed when calling [`SplinterModel`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if ``config.is_decoder=True``.
question_token_id (:obj:`int`, `optional`, defaults to 104):
The id of the ``[QUESTION]`` token.
relevant if `config.is_decoder=True`.
question_token_id (`int`, *optional*, defaults to 104):
The id of the `[QUESTION]` token.
Example::
Example:
>>> from transformers import SplinterModel, SplinterConfig
```python
>>> from transformers import SplinterModel, SplinterConfig
>>> # Initializing a Splinter tau/splinter-base style configuration
>>> configuration = SplinterConfig()
>>> # Initializing a Splinter tau/splinter-base style configuration
>>> configuration = SplinterConfig()
>>> # Initializing a model from the tau/splinter-base style configuration
>>> model = SplinterModel(configuration)
>>> # Initializing a model from the tau/splinter-base style configuration
>>> model = SplinterModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
"""
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "splinter"
def __init__(
......
......@@ -76,44 +76,43 @@ class SplinterTokenizer(PreTrainedTokenizer):
r"""
Construct a Splinter tokenizer. Based on WordPiece.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
Users should refer to this superclass for more information regarding those methods.
Args:
vocab_file (:obj:`str`):
vocab_file (`str`):
File containing the vocabulary.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing.
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`):
do_basic_tokenize (`bool`, *optional*, defaults to `True`):
Whether or not to do basic tokenization before WordPiece.
never_split (:obj:`Iterable`, `optional`):
never_split (`Iterable`, *optional*):
Collection of tokens which will never be split during tokenization. Only has an effect when
:obj:`do_basic_tokenize=True`
unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
`do_basic_tokenize=True`
unk_token (`str`, *optional*, defaults to `"[UNK]"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
sep_token (`str`, *optional*, defaults to `"[SEP]"`):
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
sequence classification or for a text and a question for question answering. It is also used as the last
token of a sequence built with special tokens.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
pad_token (`str`, *optional*, defaults to `"[PAD]"`):
The token used for padding, for example when batching sequences of different lengths.
cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens.
mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
mask_token (`str`, *optional*, defaults to `"[MASK]"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
question_token (:obj:`str`, `optional`, defaults to :obj:`"[QUESTION]"`):
question_token (`str`, *optional*, defaults to `"[QUESTION]"`):
The token used for constructing question representations.
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters.
This should likely be deactivated for Japanese (see this `issue
<https://github.com/huggingface/transformers/issues/328>`__).
strip_accents: (:obj:`bool`, `optional`):
This should likely be deactivated for Japanese (see this [issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT).
value for `lowercase` (as in the original BERT).
"""
vocab_files_names = VOCAB_FILES_NAMES
......@@ -172,7 +171,7 @@ class SplinterTokenizer(PreTrainedTokenizer):
@property
def question_token_id(self):
"""
:obj:`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
representation.
"""
return self.convert_tokens_to_ids(self.question_token)
......@@ -222,17 +221,17 @@ class SplinterTokenizer(PreTrainedTokenizer):
Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special
tokens. A Splinter sequence has the following format:
- single sequence: ``[CLS] X [SEP]``
- pair of sequences for question answering: ``[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]``
- single sequence: `[CLS] X [SEP]`
- pair of sequences for question answering: `[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`
Args:
token_ids_0 (:obj:`List[int]`):
token_ids_0 (`List[int]`):
The question token IDs if pad_on_right, else context tokens IDs
token_ids_1 (:obj:`List[int]`, `optional`):
token_ids_1 (`List[int]`, *optional*):
The context token IDs if pad_on_right, else question token IDs
Returns:
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
`List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
"""
if token_ids_1 is None:
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
......@@ -252,18 +251,18 @@ class SplinterTokenizer(PreTrainedTokenizer):
) -> List[int]:
"""
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer ``prepare_for_model`` method.
special tokens using the tokenizer `prepare_for_model` method.
Args:
token_ids_0 (:obj:`List[int]`):
token_ids_0 (`List[int]`):
List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`):
token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model.
Returns:
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
"""
if already_has_special_tokens:
......@@ -279,17 +278,16 @@ class SplinterTokenizer(PreTrainedTokenizer):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""
Create the token type IDs corresponding to the sequences passed. `What are token type IDs?
<../glossary.html#token-type-ids>`__
Create the token type IDs corresponding to the sequences passed. [What are token type IDs?](../glossary#token-type-ids)
Should be overridden in a subclass if the model has a special way of building those.
Args:
token_ids_0 (:obj:`List[int]`): The first tokenized sequence.
token_ids_1 (:obj:`List[int]`, `optional`): The second tokenized sequence.
token_ids_0 (`List[int]`): The first tokenized sequence.
token_ids_1 (`List[int]`, *optional*): The second tokenized sequence.
Returns:
:obj:`List[int]`: The token type ids.
`List[int]`: The token type ids.
"""
sep = [self.sep_token_id]
cls = [self.cls_token_id]
......@@ -330,19 +328,18 @@ class BasicTokenizer(object):
Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).
Args:
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing.
never_split (:obj:`Iterable`, `optional`):
never_split (`Iterable`, *optional*):
Collection of tokens which will never be split during tokenization. Only has an effect when
:obj:`do_basic_tokenize=True`
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
`do_basic_tokenize=True`
tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters.
This should likely be deactivated for Japanese (see this `issue
<https://github.com/huggingface/transformers/issues/328>`__).
strip_accents: (:obj:`bool`, `optional`):
This should likely be deactivated for Japanese (see this [issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT).
value for `lowercase` (as in the original BERT).
"""
def __init__(self, do_lower_case=True, never_split=None, tokenize_chinese_chars=True, strip_accents=None):
......@@ -359,9 +356,9 @@ class BasicTokenizer(object):
WordPieceTokenizer.
Args:
**never_split**: (`optional`) list of str
**never_split**: (*optional*) list of str
Kept for backward compatibility purposes. Now implemented directly at the base class level (see
:func:`PreTrainedTokenizer.tokenize`) List of token not to split.
[`PreTrainedTokenizer.tokenize`]) List of token not to split.
"""
# union() returns a new set by concatenating the two sets.
never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
......@@ -487,11 +484,11 @@ class WordpieceTokenizer(object):
Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
tokenization using the given vocabulary.
For example, :obj:`input = "unaffable"` wil return as output :obj:`["un", "##aff", "##able"]`.
For example, `input = "unaffable"` wil return as output `["un", "##aff", "##able"]`.
Args:
text: A single token or whitespace separated tokens. This should have
already been passed through `BasicTokenizer`.
already been passed through *BasicTokenizer*.
Returns:
A list of wordpiece tokens.
......
......@@ -54,43 +54,43 @@ PRETRAINED_INIT_CONFIGURATION = {
class SplinterTokenizerFast(PreTrainedTokenizerFast):
r"""
Construct a "fast" Splinter tokenizer (backed by HuggingFace's `tokenizers` library). Based on WordPiece.
Construct a "fast" Splinter tokenizer (backed by HuggingFace's *tokenizers* library). Based on WordPiece.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
methods. Users should refer to this superclass for more information regarding those methods.
Args:
vocab_file (:obj:`str`):
vocab_file (`str`):
File containing the vocabulary.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`):
unk_token (`str`, *optional*, defaults to `"[UNK]"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead.
sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`):
sep_token (`str`, *optional*, defaults to `"[SEP]"`):
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
sequence classification or for a text and a question for question answering. It is also used as the last
token of a sequence built with special tokens.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`):
pad_token (`str`, *optional*, defaults to `"[PAD]"`):
The token used for padding, for example when batching sequences of different lengths.
cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`):
cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens.
mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`):
mask_token (`str`, *optional*, defaults to `"[MASK]"`):
The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict.
question_token (:obj:`str`, `optional`, defaults to :obj:`"[QUESTION]"`):
question_token (`str`, *optional*, defaults to `"[QUESTION]"`):
The token used for constructing question representations.
clean_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
clean_text (`bool`, *optional*, defaults to `True`):
Whether or not to clean the text before tokenization by removing any control characters and replacing all
whitespaces by the classic one.
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see `this
issue <https://github.com/huggingface/transformers/issues/328>`__).
strip_accents: (:obj:`bool`, `optional`):
tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this
issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT).
wordpieces_prefix: (:obj:`str`, `optional`, defaults to :obj:`"##"`):
value for `lowercase` (as in the original BERT).
wordpieces_prefix: (`str`, *optional*, defaults to `"##"`):
The prefix for subwords.
"""
......@@ -145,7 +145,7 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
@property
def question_token_id(self):
"""
:obj:`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
representation.
"""
return self.convert_tokens_to_ids(self.question_token)
......@@ -157,17 +157,17 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special
tokens. A Splinter sequence has the following format:
- single sequence: ``[CLS] X [SEP]``
- pair of sequences for question answering: ``[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]``
- single sequence: `[CLS] X [SEP]`
- pair of sequences for question answering: `[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`
Args:
token_ids_0 (:obj:`List[int]`):
token_ids_0 (`List[int]`):
The question token IDs if pad_on_right, else context tokens IDs
token_ids_1 (:obj:`List[int]`, `optional`):
token_ids_1 (`List[int]`, *optional*):
The context token IDs if pad_on_right, else question token IDs
Returns:
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
`List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
"""
if token_ids_1 is None:
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
......@@ -186,17 +186,16 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]:
"""
Create the token type IDs corresponding to the sequences passed. `What are token type IDs?
<../glossary.html#token-type-ids>`__
Create the token type IDs corresponding to the sequences passed. [What are token type IDs?](../glossary#token-type-ids)
Should be overridden in a subclass if the model has a special way of building those.
Args:
token_ids_0 (:obj:`List[int]`): The first tokenized sequence.
token_ids_1 (:obj:`List[int]`, `optional`): The second tokenized sequence.
token_ids_0 (`List[int]`): The first tokenized sequence.
token_ids_1 (`List[int]`, *optional*): The second tokenized sequence.
Returns:
:obj:`List[int]`: The token type ids.
`List[int]`: The token type ids.
"""
sep = [self.sep_token_id]
cls = [self.cls_token_id]
......
......@@ -29,72 +29,74 @@ SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SqueezeBertConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.SqueezeBertModel`. It is used
This is the configuration class to store the configuration of a [`SqueezeBertModel`]. It is used
to instantiate a SqueezeBERT model according to the specified arguments, defining the model architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522):
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the SqueezeBERT model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.SqueezeBertModel`.
hidden_size (:obj:`int`, `optional`, defaults to 768):
the `inputs_ids` passed when calling [`SqueezeBertModel`].
hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12):
num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12):
num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072):
intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
`"gelu"`, `"relu"`, `"silu"` and `"gelu_new"` are supported.
hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1):
attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.BertModel` or
:class:`~transformers.TFBertModel`.
initializer_range (:obj:`float`, `optional`, defaults to 0.02):
type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the `token_type_ids` passed when calling [`BertModel`] or
[`TFBertModel`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
pad_token_id (:obj:`int`, `optional`, defaults to 0):
pad_token_id (`int`, *optional*, defaults to 0):
The ID of the token in the word embedding to use as padding.
embedding_size (:obj:`int`, `optional`, defaults to 768):
embedding_size (`int`, *optional*, defaults to 768):
The dimension of the word embedding vectors.
q_groups (:obj:`int`, `optional`, defaults to 4):
q_groups (`int`, *optional*, defaults to 4):
The number of groups in Q layer.
k_groups (:obj:`int`, `optional`, defaults to 4):
k_groups (`int`, *optional*, defaults to 4):
The number of groups in K layer.
v_groups (:obj:`int`, `optional`, defaults to 4):
v_groups (`int`, *optional*, defaults to 4):
The number of groups in V layer.
post_attention_groups (:obj:`int`, `optional`, defaults to 1):
post_attention_groups (`int`, *optional*, defaults to 1):
The number of groups in the first feed forward network layer.
intermediate_groups (:obj:`int`, `optional`, defaults to 4):
intermediate_groups (`int`, *optional*, defaults to 4):
The number of groups in the second feed forward network layer.
output_groups (:obj:`int`, `optional`, defaults to 4):
output_groups (`int`, *optional*, defaults to 4):
The number of groups in the third feed forward network layer.
Examples::
Examples:
>>> from transformers import SqueezeBertModel, SqueezeBertConfig
```python
>>> from transformers import SqueezeBertModel, SqueezeBertConfig
>>> # Initializing a SqueezeBERT configuration
>>> configuration = SqueezeBertConfig()
>>> # Initializing a SqueezeBERT configuration
>>> configuration = SqueezeBertConfig()
>>> # Initializing a model from the configuration above
>>> model = SqueezeBertModel(configuration)
>>> # Initializing a model from the configuration above
>>> model = SqueezeBertModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
>>> # Accessing the model configuration
>>> configuration = model.config
```
Attributes: pretrained_config_archive_map (Dict[str, str]): A dictionary containing all the available pre-trained
checkpoints.
......
......@@ -48,10 +48,10 @@ class SqueezeBertTokenizer(BertTokenizer):
r"""
Constructs a SqueezeBert tokenizer.
:class:`~transformers.SqueezeBertTokenizer is identical to :class:`~transformers.BertTokenizer` and runs end-to-end
[`SqueezeBertTokenizer`] is identical to [`BertTokenizer`] and runs end-to-end
tokenization: punctuation splitting + wordpiece.
Refer to superclass :class:`~transformers.BertTokenizer` for usage examples and documentation concerning
Refer to superclass [`BertTokenizer`] for usage examples and documentation concerning
parameters.
"""
......
......@@ -52,12 +52,12 @@ PRETRAINED_INIT_CONFIGURATION = {
class SqueezeBertTokenizerFast(BertTokenizerFast):
r"""
Constructs a "Fast" SqueezeBert tokenizer (backed by HuggingFace's `tokenizers` library).
Constructs a "Fast" SqueezeBert tokenizer (backed by HuggingFace's *tokenizers* library).
:class:`~transformers.SqueezeBertTokenizerFast` is identical to :class:`~transformers.BertTokenizerFast` and runs
[`SqueezeBertTokenizerFast`] is identical to [`BertTokenizerFast`] and runs
end-to-end tokenization: punctuation splitting + wordpiece.
Refer to superclass :class:`~transformers.BertTokenizerFast` for usage examples and documentation concerning
Refer to superclass [`BertTokenizerFast`] for usage examples and documentation concerning
parameters.
"""
......
......@@ -37,45 +37,44 @@ T5_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class T5Config(PretrainedConfig):
r"""
This is the configuration class to store the configuration of a :class:`~transformers.T5Model` or a
:class:`~transformers.TFT5Model`. It is used to instantiate a T5 model according to the specified arguments,
This is the configuration class to store the configuration of a [`T5Model`] or a
[`TFT5Model`]. It is used to instantiate a T5 model according to the specified arguments,
defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration
to that of the T5 `t5-small <https://huggingface.co/t5-small>`__ architecture.
to that of the T5 [t5-small](https://huggingface.co/t5-small) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from [`PretrainedConfig`] for more information.
Arguments:
vocab_size (:obj:`int`, `optional`, defaults to 32128):
vocab_size (`int`, *optional*, defaults to 32128):
Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.T5Model` or :class:`~transformers.TFT5Model`.
d_model (:obj:`int`, `optional`, defaults to 512):
`inputs_ids` passed when calling [`T5Model`] or [`TFT5Model`].
d_model (`int`, *optional*, defaults to 512):
Size of the encoder layers and the pooler layer.
d_kv (:obj:`int`, `optional`, defaults to 64):
Size of the key, query, value projections per attention head. :obj:`d_kv` has to be equal to :obj:`d_model
// num_heads`.
d_ff (:obj:`int`, `optional`, defaults to 2048):
Size of the intermediate feed forward layer in each :obj:`T5Block`.
num_layers (:obj:`int`, `optional`, defaults to 6):
d_kv (`int`, *optional*, defaults to 64):
Size of the key, query, value projections per attention head. `d_kv` has to be equal to `d_model // num_heads`.
d_ff (`int`, *optional*, defaults to 2048):
Size of the intermediate feed forward layer in each `T5Block`.
num_layers (`int`, *optional*, defaults to 6):
Number of hidden layers in the Transformer encoder.
num_decoder_layers (:obj:`int`, `optional`):
Number of hidden layers in the Transformer decoder. Will use the same value as :obj:`num_layers` if not
num_decoder_layers (`int`, *optional*):
Number of hidden layers in the Transformer decoder. Will use the same value as `num_layers` if not
set.
num_heads (:obj:`int`, `optional`, defaults to 8):
num_heads (`int`, *optional*, defaults to 8):
Number of attention heads for each attention layer in the Transformer encoder.
relative_attention_num_buckets (:obj:`int`, `optional`, defaults to 32):
relative_attention_num_buckets (`int`, *optional*, defaults to 32):
The number of buckets to use for each attention layer.
dropout_rate (:obj:`float`, `optional`, defaults to 0.1):
dropout_rate (`float`, *optional*, defaults to 0.1):
The ratio for all dropout layers.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-6):
layer_norm_eps (`float`, *optional*, defaults to 1e-6):
The epsilon used by the layer normalization layers.
initializer_factor (:obj:`float`, `optional`, defaults to 1):
initializer_factor (`float`, *optional*, defaults to 1):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
testing).
feed_forward_proj (:obj:`string`, `optional`, defaults to :obj:`"relu"`):
Type of feed forward layer to be used. Should be one of :obj:`"relu"` or :obj:`"gated-gelu"`. T5v1.1 uses
the :obj:`"gated-gelu"` feed forward projection. Original T5 uses :obj:`"relu"`.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
feed_forward_proj (`string`, *optional*, defaults to `"relu"`):
Type of feed forward layer to be used. Should be one of `"relu"` or `"gated-gelu"`. T5v1.1 uses
the `"gated-gelu"` feed forward projection. Original T5 uses `"relu"`.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
"""
model_type = "t5"
......
......@@ -1054,17 +1054,18 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
r"""
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
"""
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
......@@ -1114,24 +1115,25 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
r"""
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
>>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits
"""
>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits
```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
......@@ -1329,19 +1331,21 @@ append_call_sample_docstring(
FLAX_T5_MODEL_DOCSTRING = """
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, FlaxT5Model
```python
>>> from transformers import T5Tokenizer, FlaxT5Model
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5Model.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5Model.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="np").input_ids
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="np").input_ids
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="np").input_ids
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="np").input_ids
>>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state
>>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state
```
"""
......@@ -1476,24 +1480,25 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel):
r"""
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
>>> text = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs)
>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits
"""
>>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits
```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
......@@ -1624,19 +1629,21 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel):
FLAX_T5_CONDITIONAL_GENERATION_DOCSTRING = """
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], return_tensors='np')
>>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], return_tensors='np')
>>> # Generate Summary
>>> summary_ids = model.generate(inputs['input_ids']).sequences
>>> print(tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False))
>>> # Generate Summary
>>> summary_ids = model.generate(inputs['input_ids']).sequences
>>> print(tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False))
```
"""
......
......@@ -216,17 +216,19 @@ PARALLELIZE_DOCSTRING = r"""
DEPARALLELIZE_DOCSTRING = r"""
Moves the model to cpu from a model parallel state.
Example::
Example:
# On a 4 GPU machine with t5-3b:
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
device_map = {0: [0, 1, 2],
```python
# On a 4 GPU machine with t5-3b:
model = T5ForConditionalGeneration.from_pretrained('t5-3b')
device_map = {0: [0, 1, 2],
1: [3, 4, 5, 6, 7, 8, 9],
2: [10, 11, 12, 13, 14, 15, 16],
3: [17, 18, 19, 20, 21, 22, 23]}
model.parallelize(device_map) # Splits the model across several devices
model.deparallelize() # Put the model back on cpu and cleans memory by calling torch.cuda.empty_cache()
1: [3, 4, 5, 6, 7, 8, 9],
2: [10, 11, 12, 13, 14, 15, 16],
3: [17, 18, 19, 20, 21, 22, 23]}
model.parallelize(device_map) # Splits the model across several devices
model.deparallelize() # Put the model back on cpu and cleans memory by calling torch.cuda.empty_cache()
```
"""
......@@ -1339,20 +1341,21 @@ class T5Model(T5PreTrainedModel):
r"""
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, T5Model
```python
>>> from transformers import T5Tokenizer, T5Model
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = T5Model.from_pretrained('t5-small')
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = T5Model.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1
>>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state
"""
>>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state
```"""
use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
......@@ -1790,15 +1793,16 @@ class T5EncoderModel(T5PreTrainedModel):
r"""
Returns:
Example::
Example:
>>> from transformers import T5Tokenizer, T5EncoderModel
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = T5EncoderModel.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> outputs = model(input_ids=input_ids)
>>> last_hidden_states = outputs.last_hidden_state
"""
```python
>>> from transformers import T5Tokenizer, T5EncoderModel
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = T5EncoderModel.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> outputs = model(input_ids=input_ids)
>>> last_hidden_states = outputs.last_hidden_state
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
encoder_outputs = self.encoder(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment