Unverified Commit 27b3031d authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Mass conversion of documentation from rst to Markdown (#14866)

* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever
parent 18587639
...@@ -28,123 +28,121 @@ SEW_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -28,123 +28,121 @@ SEW_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SEWConfig(PretrainedConfig): class SEWConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.SEWModel`. It is used to This is the configuration class to store the configuration of a [`SEWModel`]. It is used to
instantiate a SEW model according to the specified arguments, defining the model architecture. Instantiating a instantiate a SEW model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the SEW `asapp/sew-tiny-100k configuration with the defaults will yield a similar configuration to that of the SEW [asapp/sew-tiny-100k](https://huggingface.co/asapp/sew-tiny-100k) architecture.
<https://huggingface.co/asapp/sew-tiny-100k>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 32): vocab_size (`int`, *optional*, defaults to 32):
Vocabulary size of the SEW model. Defines the number of different tokens that can be represented by the Vocabulary size of the SEW model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.SEW`. `inputs_ids` passed when calling [`SEW`].
hidden_size (:obj:`int`, `optional`, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12): num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12): num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072): intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
squeeze_factor (:obj:`int`, `optional`, defaults to 2): squeeze_factor (`int`, *optional*, defaults to 2):
Sequence length downsampling factor after the encoder and upsampling factor after the transformer. Sequence length downsampling factor after the encoder and upsampling factor after the transformer.
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported. `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1): attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (:obj:`float`, `optional`, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of :class:`SEWForCTC`. The dropout probability for the final projection layer of [`SEWForCTC`].
initializer_range (:obj:`float`, `optional`, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
feat_extract_norm (:obj:`str`, `optional`, defaults to :obj:`"group"`): feat_extract_norm (`str`, *optional*, defaults to `"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of :obj:`"group"` for group The norm to be applied to 1D convolutional layers in feature extractor. One of `"group"` for group
normalization of only the first 1D convolutional layer or :obj:`"layer"` for layer normalization of all 1D normalization of only the first 1D convolutional layer or `"layer"` for layer normalization of all 1D
convolutional layers. convolutional layers.
feat_proj_dropout (:obj:`float`, `optional`, defaults to 0.0): feat_proj_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for output of the feature extractor. The dropout probability for output of the feature extractor.
feat_extract_activation (:obj:`str, `optional`, defaults to :obj:`"gelu"`): feat_extract_activation (`str, `optional`, defaults to `"gelu"`):
The non-linear activation function (function or string) in the 1D convolutional layers of the feature The non-linear activation function (function or string) in the 1D convolutional layers of the feature
extractor. If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported. extractor. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
conv_dim (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`): conv_dim (`Tuple[int]`, *optional*, defaults to `(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the
feature extractor. The length of `conv_dim` defines the number of 1D convolutional layers. feature extractor. The length of *conv_dim* defines the number of 1D convolutional layers.
conv_stride (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`): conv_stride (`Tuple[int]`, *optional*, defaults to `(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length
of `conv_stride` defines the number of convolutional layers and has to match the the length of `conv_dim`. of *conv_stride* defines the number of convolutional layers and has to match the the length of *conv_dim*.
conv_kernel (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`): conv_kernel (`Tuple[int]`, *optional*, defaults to `(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The
length of `conv_kernel` defines the number of convolutional layers and has to match the the length of length of *conv_kernel* defines the number of convolutional layers and has to match the the length of
`conv_dim`. *conv_dim*.
conv_bias (:obj:`bool`, `optional`, defaults to :obj:`False`): conv_bias (`bool`, *optional*, defaults to `False`):
Whether the 1D convolutional layers have a bias. Whether the 1D convolutional layers have a bias.
num_conv_pos_embeddings (:obj:`int`, `optional`, defaults to 128): num_conv_pos_embeddings (`int`, *optional*, defaults to 128):
Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional
embeddings layer. embeddings layer.
num_conv_pos_embedding_groups (:obj:`int`, `optional`, defaults to 16): num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
Number of groups of 1D convolutional positional embeddings layer. Number of groups of 1D convolutional positional embeddings layer.
apply_spec_augment (:obj:`bool`, `optional`, defaults to :obj:`True`): apply_spec_augment (`bool`, *optional*, defaults to `True`):
Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see
`SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779).
<https://arxiv.org/abs/1904.08779>`__. mask_time_prob (`float`, *optional*, defaults to 0.05):
mask_time_prob (:obj:`float`, `optional`, defaults to 0.05):
Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
reasoning from the propability of each feature vector to be chosen as the start of the vector span to be reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
masked, `mask_time_prob` should be ``prob_vector_start*mask_time_length``. Note that overlap may decrease masked, *mask_time_prob* should be `prob_vector_start*mask_time_length`. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment is True``. the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_time_length (:obj:`int`, `optional`, defaults to 10): mask_time_length (`int`, *optional*, defaults to 10):
Length of vector span along the time axis. Length of vector span along the time axis.
mask_time_min_masks (:obj:`int`, `optional`, defaults to 2),: mask_time_min_masks (`int`, *optional*, defaults to 2),:
The minimum number of masks of length ``mask_feature_length`` generated along the time axis, each time The minimum number of masks of length `mask_feature_length` generated along the time axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks'' ''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks''
mask_feature_prob (:obj:`float`, `optional`, defaults to 0.0): mask_feature_prob (`float`, *optional*, defaults to 0.0):
Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
span to be masked, `mask_feature_prob` should be ``prob_vector_start*mask_feature_length``. Note that span to be masked, *mask_feature_prob* should be `prob_vector_start*mask_feature_length`. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment overlap may decrease the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
is True``. mask_feature_length (`int`, *optional*, defaults to 10):
mask_feature_length (:obj:`int`, `optional`, defaults to 10):
Length of vector span along the feature axis. Length of vector span along the feature axis.
mask_feature_min_masks (:obj:`int`, `optional`, defaults to 0),: mask_feature_min_masks (`int`, *optional*, defaults to 0),:
The minimum number of masks of length ``mask_feature_length`` generated along the feature axis, each time The minimum number of masks of length `mask_feature_length` generated along the feature axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks'' ''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks''
ctc_loss_reduction (:obj:`str`, `optional`, defaults to :obj:`"sum"`): ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
Specifies the reduction to apply to the output of ``torch.nn.CTCLoss``. Only relevant when training an Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
instance of :class:`~transformers.SEWForCTC`. instance of [`SEWForCTC`].
ctc_zero_infinity (:obj:`bool`, `optional`, defaults to :obj:`False`): ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
Whether to zero infinite losses and the associated gradients of ``torch.nn.CTCLoss``. Infinite losses Whether to zero infinite losses and the associated gradients of `torch.nn.CTCLoss`. Infinite losses
mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an
instance of :class:`~transformers.SEWForCTC`. instance of [`SEWForCTC`].
use_weighted_layer_sum (:obj:`bool`, `optional`, defaults to :obj:`False`): use_weighted_layer_sum (`bool`, *optional*, defaults to `False`):
Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
instance of :class:`~transformers.Wav2Vec2ForSequenceClassification`. instance of [`Wav2Vec2ForSequenceClassification`].
classifier_proj_size (:obj:`int`, `optional`, defaults to 256): classifier_proj_size (`int`, *optional*, defaults to 256):
Dimensionality of the projection before token mean-pooling for classification. Dimensionality of the projection before token mean-pooling for classification.
Example:: Example:
>>> from transformers import SEWModel, SEWConfig ```python
>>> from transformers import SEWModel, SEWConfig
>>> # Initializing a SEW asapp/sew-tiny-100k style configuration >>> # Initializing a SEW asapp/sew-tiny-100k style configuration
>>> configuration = SEWConfig() >>> configuration = SEWConfig()
>>> # Initializing a model from the asapp/sew-tiny-100k style configuration >>> # Initializing a model from the asapp/sew-tiny-100k style configuration
>>> model = SEWModel(configuration) >>> model = SEWModel(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
""" ```"""
model_type = "sew" model_type = "sew"
def __init__( def __init__(
......
...@@ -28,143 +28,141 @@ SEW_D_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -28,143 +28,141 @@ SEW_D_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SEWDConfig(PretrainedConfig): class SEWDConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.SEWDModel`. It is used to This is the configuration class to store the configuration of a [`SEWDModel`]. It is used to
instantiate a SEW-D model according to the specified arguments, defining the model architecture. Instantiating a instantiate a SEW-D model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the SEW-D `asapp/sew-d-tiny-100k configuration with the defaults will yield a similar configuration to that of the SEW-D [asapp/sew-d-tiny-100k](https://huggingface.co/asapp/sew-d-tiny-100k) architecture.
<https://huggingface.co/asapp/sew-d-tiny-100k>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 32): vocab_size (`int`, *optional*, defaults to 32):
Vocabulary size of the SEW-D model. Defines the number of different tokens that can be represented by the Vocabulary size of the SEW-D model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.SEWD`. `inputs_ids` passed when calling [`SEWD`].
hidden_size (:obj:`int`, `optional`, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12): num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12): num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072): intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
squeeze_factor (:obj:`int`, `optional`, defaults to 2): squeeze_factor (`int`, *optional*, defaults to 2):
Sequence length downsampling factor after the encoder and upsampling factor after the transformer. Sequence length downsampling factor after the encoder and upsampling factor after the transformer.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
position_buckets (:obj:`int`, `optional`, defaults to 256): position_buckets (`int`, *optional*, defaults to 256):
The maximum size of relative position embeddings. The maximum size of relative position embeddings.
share_att_key (:obj:`bool`, `optional`, defaults to :obj:`True`): share_att_key (`bool`, *optional*, defaults to `True`):
Whether to share attention key with c2p and p2c. Whether to share attention key with c2p and p2c.
relative_attention (:obj:`bool`, `optional`, defaults to :obj:`True`): relative_attention (`bool`, *optional*, defaults to `True`):
Whether to use relative position encoding. Whether to use relative position encoding.
position_biased_input (:obj:`bool`, `optional`, defaults to :obj:`False`): position_biased_input (`bool`, *optional*, defaults to `False`):
Whether to add absolute position embedding to content embedding. Whether to add absolute position embedding to content embedding.
pos_att_type (:obj:`Tuple[str]`, `optional`, defaults to :obj:`("p2c", "c2p")`): pos_att_type (`Tuple[str]`, *optional*, defaults to `("p2c", "c2p")`):
The type of relative position attention, it can be a combination of :obj:`("p2c", "c2p", "p2p")`, e.g. The type of relative position attention, it can be a combination of `("p2c", "c2p", "p2p")`, e.g.
:obj:`("p2c")`, :obj:`("p2c", "c2p")`, :obj:`("p2c", "c2p", 'p2p")`. `("p2c")`, `("p2c", "c2p")`, `("p2c", "c2p", 'p2p")`.
norm_rel_ebd (:obj:`str`, `optional`, defaults to :obj:`"layer_norm"`): norm_rel_ebd (`str`, *optional*, defaults to `"layer_norm"`):
Whether to use layer norm in relative embedding (:obj:`"layer_norm"` if yes) Whether to use layer norm in relative embedding (`"layer_norm"` if yes)
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu_python"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu_python"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"`, :obj:`"gelu_python"` and :obj:`"gelu_new"` are supported. `"gelu"`, `"relu"`, `"selu"`, `"gelu_python"` and `"gelu_new"` are supported.
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.1): attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
final_dropout (:obj:`float`, `optional`, defaults to 0.1): final_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the final projection layer of :class:`SEWDForCTC`. The dropout probability for the final projection layer of [`SEWDForCTC`].
initializer_range (:obj:`float`, `optional`, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-7): layer_norm_eps (`float`, *optional*, defaults to 1e-7):
The epsilon used by the layer normalization layers in the transformer encoder. The epsilon used by the layer normalization layers in the transformer encoder.
feature_layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-5): feature_layer_norm_eps (`float`, *optional*, defaults to 1e-5):
The epsilon used by the layer normalization after the feature extractor. The epsilon used by the layer normalization after the feature extractor.
feat_extract_norm (:obj:`str`, `optional`, defaults to :obj:`"group"`): feat_extract_norm (`str`, *optional*, defaults to `"group"`):
The norm to be applied to 1D convolutional layers in feature extractor. One of :obj:`"group"` for group The norm to be applied to 1D convolutional layers in feature extractor. One of `"group"` for group
normalization of only the first 1D convolutional layer or :obj:`"layer"` for layer normalization of all 1D normalization of only the first 1D convolutional layer or `"layer"` for layer normalization of all 1D
convolutional layers. convolutional layers.
feat_proj_dropout (:obj:`float`, `optional`, defaults to 0.0): feat_proj_dropout (`float`, *optional*, defaults to 0.0):
The dropout probability for output of the feature extractor. The dropout probability for output of the feature extractor.
feat_extract_activation (:obj:`str, `optional`, defaults to :obj:`"gelu"`): feat_extract_activation (`str, `optional`, defaults to `"gelu"`):
The non-linear activation function (function or string) in the 1D convolutional layers of the feature The non-linear activation function (function or string) in the 1D convolutional layers of the feature
extractor. If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported. extractor. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
conv_dim (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`): conv_dim (`Tuple[int]`, *optional*, defaults to `(64, 128, 128, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512)`):
A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the A tuple of integers defining the number of input and output channels of each 1D convolutional layer in the
feature extractor. The length of `conv_dim` defines the number of 1D convolutional layers. feature extractor. The length of *conv_dim* defines the number of 1D convolutional layers.
conv_stride (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`): conv_stride (`Tuple[int]`, *optional*, defaults to `(5, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1)`):
A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length A tuple of integers defining the stride of each 1D convolutional layer in the feature extractor. The length
of `conv_stride` defines the number of convolutional layers and has to match the the length of `conv_dim`. of *conv_stride* defines the number of convolutional layers and has to match the the length of *conv_dim*.
conv_kernel (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`): conv_kernel (`Tuple[int]`, *optional*, defaults to `(10, 3, 1, 3, 1, 3, 1, 3, 1, 2, 1, 2, 1)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The A tuple of integers defining the kernel size of each 1D convolutional layer in the feature extractor. The
length of `conv_kernel` defines the number of convolutional layers and has to match the the length of length of *conv_kernel* defines the number of convolutional layers and has to match the the length of
`conv_dim`. *conv_dim*.
conv_bias (:obj:`bool`, `optional`, defaults to :obj:`False`): conv_bias (`bool`, *optional*, defaults to `False`):
Whether the 1D convolutional layers have a bias. Whether the 1D convolutional layers have a bias.
num_conv_pos_embeddings (:obj:`int`, `optional`, defaults to 128): num_conv_pos_embeddings (`int`, *optional*, defaults to 128):
Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional Number of convolutional positional embeddings. Defines the kernel size of 1D convolutional positional
embeddings layer. embeddings layer.
num_conv_pos_embedding_groups (:obj:`int`, `optional`, defaults to 16): num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
Number of groups of 1D convolutional positional embeddings layer. Number of groups of 1D convolutional positional embeddings layer.
apply_spec_augment (:obj:`bool`, `optional`, defaults to :obj:`True`): apply_spec_augment (`bool`, *optional*, defaults to `True`):
Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see Whether to apply *SpecAugment* data augmentation to the outputs of the feature extractor. For reference see
`SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition](https://arxiv.org/abs/1904.08779).
<https://arxiv.org/abs/1904.08779>`__. mask_time_prob (`float`, *optional*, defaults to 0.05):
mask_time_prob (:obj:`float`, `optional`, defaults to 0.05):
Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking Percentage (between 0 and 1) of all feature vectors along the time axis which will be masked. The masking
procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If procecure generates ''mask_time_prob*len(time_axis)/mask_time_length'' independent masks over the axis. If
reasoning from the propability of each feature vector to be chosen as the start of the vector span to be reasoning from the propability of each feature vector to be chosen as the start of the vector span to be
masked, `mask_time_prob` should be ``prob_vector_start*mask_time_length``. Note that overlap may decrease masked, *mask_time_prob* should be `prob_vector_start*mask_time_length`. Note that overlap may decrease
the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment is True``. the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
mask_time_length (:obj:`int`, `optional`, defaults to 10): mask_time_length (`int`, *optional*, defaults to 10):
Length of vector span along the time axis. Length of vector span along the time axis.
mask_time_min_masks (:obj:`int`, `optional`, defaults to 2),: mask_time_min_masks (`int`, *optional*, defaults to 2),:
The minimum number of masks of length ``mask_feature_length`` generated along the time axis, each time The minimum number of masks of length `mask_feature_length` generated along the time axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks'' ''mask_time_prob*len(time_axis)/mask_time_length < mask_time_min_masks''
mask_feature_prob (:obj:`float`, `optional`, defaults to 0.0): mask_feature_prob (`float`, *optional*, defaults to 0.0):
Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The Percentage (between 0 and 1) of all feature vectors along the feature axis which will be masked. The
masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over masking procecure generates ''mask_feature_prob*len(feature_axis)/mask_time_length'' independent masks over
the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector the axis. If reasoning from the propability of each feature vector to be chosen as the start of the vector
span to be masked, `mask_feature_prob` should be ``prob_vector_start*mask_feature_length``. Note that span to be masked, *mask_feature_prob* should be `prob_vector_start*mask_feature_length`. Note that
overlap may decrease the actual percentage of masked vectors. This is only relevant if ``apply_spec_augment overlap may decrease the actual percentage of masked vectors. This is only relevant if `apply_spec_augment is True`.
is True``. mask_feature_length (`int`, *optional*, defaults to 10):
mask_feature_length (:obj:`int`, `optional`, defaults to 10):
Length of vector span along the feature axis. Length of vector span along the feature axis.
mask_feature_min_masks (:obj:`int`, `optional`, defaults to 0),: mask_feature_min_masks (`int`, *optional*, defaults to 0),:
The minimum number of masks of length ``mask_feature_length`` generated along the feature axis, each time The minimum number of masks of length `mask_feature_length` generated along the feature axis, each time
step, irrespectively of ``mask_feature_prob``. Only relevant if step, irrespectively of `mask_feature_prob`. Only relevant if
''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks'' ''mask_feature_prob*len(feature_axis)/mask_feature_length < mask_feature_min_masks''
diversity_loss_weight (:obj:`int`, `optional`, defaults to 0.1): diversity_loss_weight (`int`, *optional*, defaults to 0.1):
The weight of the codebook diversity loss component. The weight of the codebook diversity loss component.
ctc_loss_reduction (:obj:`str`, `optional`, defaults to :obj:`"sum"`): ctc_loss_reduction (`str`, *optional*, defaults to `"sum"`):
Specifies the reduction to apply to the output of ``torch.nn.CTCLoss``. Only relevant when training an Specifies the reduction to apply to the output of `torch.nn.CTCLoss`. Only relevant when training an
instance of :class:`~transformers.SEWDForCTC`. instance of [`SEWDForCTC`].
ctc_zero_infinity (:obj:`bool`, `optional`, defaults to :obj:`False`): ctc_zero_infinity (`bool`, *optional*, defaults to `False`):
Whether to zero infinite losses and the associated gradients of ``torch.nn.CTCLoss``. Infinite losses Whether to zero infinite losses and the associated gradients of `torch.nn.CTCLoss`. Infinite losses
mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an mainly occur when the inputs are too short to be aligned to the targets. Only relevant when training an
instance of :class:`~transformers.SEWDForCTC`. instance of [`SEWDForCTC`].
use_weighted_layer_sum (:obj:`bool`, `optional`, defaults to :obj:`False`): use_weighted_layer_sum (`bool`, *optional*, defaults to `False`):
Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an Whether to use a weighted average of layer outputs with learned weights. Only relevant when using an
instance of :class:`~transformers.Wav2Vec2ForSequenceClassification`. instance of [`Wav2Vec2ForSequenceClassification`].
classifier_proj_size (:obj:`int`, `optional`, defaults to 256): classifier_proj_size (`int`, *optional*, defaults to 256):
Dimensionality of the projection before token mean-pooling for classification. Dimensionality of the projection before token mean-pooling for classification.
Example:: Example:
>>> from transformers import SEWDModel, SEWDConfig ```python
>>> from transformers import SEWDModel, SEWDConfig
>>> # Initializing a SEW-D asapp/sew-d-tiny-100k style configuration >>> # Initializing a SEW-D asapp/sew-d-tiny-100k style configuration
>>> configuration = SEWDConfig() >>> configuration = SEWDConfig()
>>> # Initializing a model from the asapp/sew-d-tiny-100k style configuration >>> # Initializing a model from the asapp/sew-d-tiny-100k style configuration
>>> model = SEWDModel(configuration) >>> model = SEWDModel(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
""" ```"""
model_type = "sew-d" model_type = "sew-d"
def __init__( def __init__(
......
...@@ -26,49 +26,50 @@ logger = logging.get_logger(__name__) ...@@ -26,49 +26,50 @@ logger = logging.get_logger(__name__)
class SpeechEncoderDecoderConfig(PretrainedConfig): class SpeechEncoderDecoderConfig(PretrainedConfig):
r""" r"""
:class:`~transformers.SpeechEncoderDecoderConfig` is the configuration class to store the configuration of a [`SpeechEncoderDecoderConfig`] is the configuration class to store the configuration of a
:class:`~transformers.SpeechEncoderDecoderModel`. It is used to instantiate an Encoder Decoder model according to [`SpeechEncoderDecoderModel`]. It is used to instantiate an Encoder Decoder model according to
the specified arguments, defining the encoder and decoder configs. the specified arguments, defining the encoder and decoder configs.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
kwargs (`optional`): kwargs (*optional*):
Dictionary of keyword arguments. Notably: Dictionary of keyword arguments. Notably:
- **encoder** (:class:`~transformers.PretrainedConfig`, `optional`) -- An instance of a configuration - **encoder** ([`PretrainedConfig`], *optional*) -- An instance of a configuration
object that defines the encoder config. object that defines the encoder config.
- **decoder** (:class:`~transformers.PretrainedConfig`, `optional`) -- An instance of a configuration - **decoder** ([`PretrainedConfig`], *optional*) -- An instance of a configuration
object that defines the decoder config. object that defines the decoder config.
Examples:: Examples:
>>> from transformers import BertConfig, Wav2Vec2Config, SpeechEncoderDecoderConfig, SpeechEncoderDecoderModel ```python
>>> from transformers import BertConfig, Wav2Vec2Config, SpeechEncoderDecoderConfig, SpeechEncoderDecoderModel
>>> # Initializing a Wav2Vec2 & BERT style configuration >>> # Initializing a Wav2Vec2 & BERT style configuration
>>> config_encoder = Wav2Vec2Config() >>> config_encoder = Wav2Vec2Config()
>>> config_decoder = BertConfig() >>> config_decoder = BertConfig()
>>> config = SpeechEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) >>> config = SpeechEncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & bert-base-uncased style configurations >>> # Initializing a Wav2Vec2Bert model from a Wav2Vec2 & bert-base-uncased style configurations
>>> model = SpeechEncoderDecoderModel(config=config) >>> model = SpeechEncoderDecoderModel(config=config)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> config_encoder = model.config.encoder >>> config_encoder = model.config.encoder
>>> config_decoder = model.config.decoder >>> config_decoder = model.config.decoder
>>> # set decoder config to causal lm >>> # set decoder config to causal lm
>>> config_decoder.is_decoder = True >>> config_decoder.is_decoder = True
>>> config_decoder.add_cross_attention = True >>> config_decoder.add_cross_attention = True
>>> # Saving the model, including its configuration >>> # Saving the model, including its configuration
>>> model.save_pretrained('my-model') >>> model.save_pretrained('my-model')
>>> # loading model and config from pretrained folder >>> # loading model and config from pretrained folder
>>> encoder_decoder_config = SpeechEncoderDecoderConfig.from_pretrained('my-model') >>> encoder_decoder_config = SpeechEncoderDecoderConfig.from_pretrained('my-model')
>>> model = SpeechEncoderDecoderModel.from_pretrained('my-model', config=encoder_decoder_config) >>> model = SpeechEncoderDecoderModel.from_pretrained('my-model', config=encoder_decoder_config)
""" ```"""
model_type = "speech-encoder-decoder" model_type = "speech-encoder-decoder"
is_composition = True is_composition = True
...@@ -93,11 +94,11 @@ class SpeechEncoderDecoderConfig(PretrainedConfig): ...@@ -93,11 +94,11 @@ class SpeechEncoderDecoderConfig(PretrainedConfig):
cls, encoder_config: PretrainedConfig, decoder_config: PretrainedConfig, **kwargs cls, encoder_config: PretrainedConfig, decoder_config: PretrainedConfig, **kwargs
) -> PretrainedConfig: ) -> PretrainedConfig:
r""" r"""
Instantiate a :class:`~transformers.SpeechEncoderDecoderConfig` (or a derived class) from a pre-trained encoder Instantiate a [`SpeechEncoderDecoderConfig`] (or a derived class) from a pre-trained encoder
model configuration and decoder model configuration. model configuration and decoder model configuration.
Returns: Returns:
:class:`SpeechEncoderDecoderConfig`: An instance of a configuration object [`SpeechEncoderDecoderConfig`]: An instance of a configuration object
""" """
logger.info("Setting `config.is_decoder=True` and `config.add_cross_attention=True` for decoder_config") logger.info("Setting `config.is_decoder=True` and `config.add_cross_attention=True` for decoder_config")
decoder_config.is_decoder = True decoder_config.is_decoder = True
...@@ -107,10 +108,10 @@ class SpeechEncoderDecoderConfig(PretrainedConfig): ...@@ -107,10 +108,10 @@ class SpeechEncoderDecoderConfig(PretrainedConfig):
def to_dict(self): def to_dict(self):
""" """
Serializes this instance to a Python dictionary. Override the default `to_dict()` from `PretrainedConfig`. Serializes this instance to a Python dictionary. Override the default *to_dict()* from *PretrainedConfig*.
Returns: Returns:
:obj:`Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance, `Dict[str, any]`: Dictionary of all the attributes that make up this configuration instance,
""" """
output = copy.deepcopy(self.__dict__) output = copy.deepcopy(self.__dict__)
output["encoder"] = self.encoder.to_dict() output["encoder"] = self.encoder.to_dict()
......
...@@ -435,26 +435,26 @@ class SpeechEncoderDecoderModel(PreTrainedModel): ...@@ -435,26 +435,26 @@ class SpeechEncoderDecoderModel(PreTrainedModel):
r""" r"""
Returns: Returns:
Examples:: Examples:
>>> from transformers import SpeechEncoderDecoderModel, Speech2Text2Processor ```python
>>> from datasets import load_dataset >>> from transformers import SpeechEncoderDecoderModel, Speech2Text2Processor
>>> import torch >>> from datasets import load_dataset
>>> import torch
>>> processor = Speech2Text2Processor.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> model = SpeechEncoderDecoderModel.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") >>> processor = Speech2Text2Processor.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> model = SpeechEncoderDecoderModel.from_pretrained('facebook/s2t-wav2vec2-large-en-de')
>>> input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values >>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> decoder_input_ids = torch.tensor([[model.config.decoder.decoder_start_token_id]])
>>> outputs = model(input_values=input_values, decoder_input_ids=decoder_input_ids)
>>> # inference (generation) >>> input_values = processor(ds[0]["audio"]["array"], return_tensors="pt").input_values
>>> generated = model.generate(input_values) >>> decoder_input_ids = torch.tensor([[model.config.decoder.decoder_start_token_id]])
>>> translation = processor.batch_decode(generated) >>> outputs = model(input_values=input_values, decoder_input_ids=decoder_input_ids)
""" >>> # inference (generation)
>>> generated = model.generate(input_values)
>>> translation = processor.batch_decode(generated)
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
kwargs_encoder = {argument: value for argument, value in kwargs.items() if not argument.startswith("decoder_")} kwargs_encoder = {argument: value for argument, value in kwargs.items() if not argument.startswith("decoder_")}
......
...@@ -28,86 +28,87 @@ SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -28,86 +28,87 @@ SPEECH_TO_TEXT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class Speech2TextConfig(PretrainedConfig): class Speech2TextConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.Speech2TextModel`. It is used This is the configuration class to store the configuration of a [`Speech2TextModel`]. It is used
to instantiate an Speech2Text model according to the specified arguments, defining the model architecture. to instantiate an Speech2Text model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text
`facebook/s2t-small-librispeech-asr <https://huggingface.co/facebook/s2t-small-librispeech-asr>`__ architecture. [facebook/s2t-small-librispeech-asr](https://huggingface.co/facebook/s2t-small-librispeech-asr) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 50265): vocab_size (`int`, *optional*, defaults to 50265):
Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.Speech2TextModel` the `inputs_ids` passed when calling [`Speech2TextModel`]
d_model (:obj:`int`, `optional`, defaults to 1024): d_model (`int`, *optional*, defaults to 1024):
Dimensionality of the layers and the pooler layer. Dimensionality of the layers and the pooler layer.
encoder_layers (:obj:`int`, `optional`, defaults to 12): encoder_layers (`int`, *optional*, defaults to 12):
Number of encoder layers. Number of encoder layers.
decoder_layers (:obj:`int`, `optional`, defaults to 12): decoder_layers (`int`, *optional*, defaults to 12):
Number of decoder layers. Number of decoder layers.
encoder_attention_heads (:obj:`int`, `optional`, defaults to 16): encoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
decoder_attention_heads (:obj:`int`, `optional`, defaults to 16): decoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder. Number of attention heads for each attention layer in the Transformer decoder.
decoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096): decoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder. Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
encoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096): encoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder. Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`): activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported. `"gelu"`, `"relu"`, `"silu"` and `"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer. The dropout ratio for activations inside the fully connected layer.
classifier_dropout (:obj:`float`, `optional`, defaults to 0.0): classifier_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for classifier. The dropout ratio for classifier.
init_std (:obj:`float`, `optional`, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
encoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0): encoder_layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the encoder. See the `LayerDrop paper <see The LayerDrop probability for the encoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556>`__ for more details. https://arxiv.org/abs/1909.11556) for more details.
decoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0): decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
The LayerDrop probability for the decoder. See the `LayerDrop paper <see The LayerDrop probability for the decoder. See the [LayerDrop paper](see
https://arxiv.org/abs/1909.11556>`__ for more details. https://arxiv.org/abs/1909.11556) for more details.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
max_source_positions (:obj:`int`, `optional`, defaults to 6000): max_source_positions (`int`, *optional*, defaults to 6000):
The maximum sequence length of log-mel filter-bank features that this model might ever be used with. The maximum sequence length of log-mel filter-bank features that this model might ever be used with.
max_target_positions (:obj:`int`, `optional`, defaults to 1024): max_target_positions (`int`, *optional*, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
num_conv_layers (:obj:`int`, `optional`, defaults to 2): num_conv_layers (`int`, *optional*, defaults to 2):
Number of 1D convolutional layers in the conv module. Number of 1D convolutional layers in the conv module.
conv_kernel_sizes (:obj:`Tuple[int]`, `optional`, defaults to :obj:`(5, 5)`): conv_kernel_sizes (`Tuple[int]`, *optional*, defaults to `(5, 5)`):
A tuple of integers defining the kernel size of each 1D convolutional layer in the conv module. The length A tuple of integers defining the kernel size of each 1D convolutional layer in the conv module. The length
of :obj:`conv_kernel_sizes` has to match :obj:`num_conv_layers`. of `conv_kernel_sizes` has to match `num_conv_layers`.
conv_channels (:obj:`int`, `optional`, defaults to 1024): conv_channels (`int`, *optional*, defaults to 1024):
An integer defining the number of output channels of each convolution layers except the final one in the An integer defining the number of output channels of each convolution layers except the final one in the
conv module. conv module.
input_feat_per_channel (:obj:`int`, `optional`, defaults to 80): input_feat_per_channel (`int`, *optional*, defaults to 80):
An integer specifying the size of feature vector. This is also the dimensions of log-mel filter-bank An integer specifying the size of feature vector. This is also the dimensions of log-mel filter-bank
features. features.
input_channels (:obj:`int`, `optional`, defaults to 1): input_channels (`int`, *optional*, defaults to 1):
An integer specifying number of input channels of the input feature vector. An integer specifying number of input channels of the input feature vector.
Example:: Example:
>>> from transformers import Speech2TextModel, Speech2TextConfig ```python
>>> from transformers import Speech2TextModel, Speech2TextConfig
>>> # Initializing a Speech2Text s2t_transformer_s style configuration >>> # Initializing a Speech2Text s2t_transformer_s style configuration
>>> configuration = Speech2TextConfig() >>> configuration = Speech2TextConfig()
>>> # Initializing a model from the s2t_transformer_s style configuration >>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2TextModel(configuration) >>> model = Speech2TextModel(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
""" ```"""
model_type = "speech_to_text" model_type = "speech_to_text"
keys_to_ignore_at_inference = ["past_key_values"] keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {"num_attention_heads": "encoder_attention_heads", "hidden_size": "d_model"} attribute_map = {"num_attention_heads": "encoder_attention_heads", "hidden_size": "d_model"}
......
...@@ -35,26 +35,26 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor): ...@@ -35,26 +35,26 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor):
r""" r"""
Constructs a Speech2Text feature extractor. Constructs a Speech2Text feature extractor.
This feature extractor inherits from :class:`~transformers.Speech2TextFeatureExtractor` which contains most of the This feature extractor inherits from [`Speech2TextFeatureExtractor`] which contains most of the
main methods. Users should refer to this superclass for more information regarding those methods. main methods. Users should refer to this superclass for more information regarding those methods.
This class extracts mel-filter bank features from raw speech using TorchAudio and applies utterance-level cepstral This class extracts mel-filter bank features from raw speech using TorchAudio and applies utterance-level cepstral
mean and variance normalization to the extracted features. mean and variance normalization to the extracted features.
Args: Args:
feature_size (:obj:`int`, defaults to 80): feature_size (`int`, defaults to 80):
The feature dimension of the extracted features. The feature dimension of the extracted features.
sampling_rate (:obj:`int`, defaults to 16000): sampling_rate (`int`, defaults to 16000):
The sampling rate at which the audio files should be digitalized expressed in Hertz per second (Hz). The sampling rate at which the audio files should be digitalized expressed in Hertz per second (Hz).
num_mel_bins (:obj:`int`, defaults to 80): num_mel_bins (`int`, defaults to 80):
Number of Mel-frequency bins. Number of Mel-frequency bins.
padding_value (:obj:`float`, defaults to 0.0): padding_value (`float`, defaults to 0.0):
The value that is used to fill the padding vectors. The value that is used to fill the padding vectors.
do_ceptral_normalize (:obj:`bool`, `optional`, defaults to :obj:`True`): do_ceptral_normalize (`bool`, *optional*, defaults to `True`):
Whether or not to apply utterance-level cepstral mean and variance normalization to extracted features. Whether or not to apply utterance-level cepstral mean and variance normalization to extracted features.
normalize_means (:obj:`bool`, `optional`, defaults to :obj:`True`): normalize_means (`bool`, *optional*, defaults to `True`):
Whether or not to zero-mean normalize the extracted features. Whether or not to zero-mean normalize the extracted features.
normalize_vars (:obj:`bool`, `optional`, defaults to :obj:`True`): normalize_vars (`bool`, *optional*, defaults to `True`):
Whether or not to unit-variance normalize the extracted features. Whether or not to unit-variance normalize the extracted features.
""" """
...@@ -140,49 +140,51 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor): ...@@ -140,49 +140,51 @@ class Speech2TextFeatureExtractor(SequenceFeatureExtractor):
Main method to featurize and prepare for the model one or several sequence(s). sequences. Main method to featurize and prepare for the model one or several sequence(s). sequences.
Args: Args:
raw_speech (:obj:`np.ndarray`, :obj:`List[float]`, :obj:`List[np.ndarray]`, :obj:`List[List[float]]`): raw_speech (`np.ndarray`, `List[float]`, `List[np.ndarray]`, `List[List[float]]`):
The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float The sequence or batch of sequences to be padded. Each sequence can be a numpy array, a list of float
values, a list of numpy arrays or a list of list of float values. values, a list of numpy arrays or a list of list of float values.
padding (:obj:`bool`, :obj:`str` or :class:`~transformers.file_utils.PaddingStrategy`, `optional`, defaults to :obj:`True`): padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `True`):
Select a strategy to pad the returned sequences (according to the model's padding side and padding Select a strategy to pad the returned sequences (according to the model's padding side and padding
index) among: index) among:
* :obj:`True` or :obj:`'longest'`: Pad to the longest sequence in the batch (or no padding if only a - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a
single sequence if provided). single sequence if provided).
* :obj:`'max_length'`: Pad to a maximum length specified with the argument :obj:`max_length` or to the - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided. maximum acceptable input length for the model if that argument is not provided.
* :obj:`False` or :obj:`'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths). different lengths).
max_length (:obj:`int`, `optional`): max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above). Maximum length of the returned list and optionally padding length (see above).
truncation (:obj:`bool`): truncation (`bool`):
Activates truncation to cut input sequences longer than `max_length` to `max_length`. Activates truncation to cut input sequences longer than *max_length* to *max_length*.
pad_to_multiple_of (:obj:`int`, `optional`): pad_to_multiple_of (`int`, *optional*):
If set will pad the sequence to a multiple of the provided value. If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask (:obj:`bool`, `optional`): return_attention_mask (`bool`, *optional*):
Whether to return the attention mask. If left to the default, will return the attention mask according Whether to return the attention mask. If left to the default, will return the attention mask according
to the specific feature_extractor's default. to the specific feature_extractor's default.
`What are attention masks? <../glossary.html#attention-mask>`__ [What are attention masks?](../glossary#attention-mask)
.. note:: <Tip>
For Speech2TextTransoformer models, :obj:`attention_mask` should alwys be passed for batched For Speech2TextTransoformer models, `attention_mask` should alwys be passed for batched
inference, to avoid subtle bugs. inference, to avoid subtle bugs.
return_tensors (:obj:`str` or :class:`~transformers.file_utils.TensorType`, `optional`): </Tip>
return_tensors (`str` or [`~file_utils.TensorType`], *optional*):
If set, will return tensors instead of list of python integers. Acceptable values are: If set, will return tensors instead of list of python integers. Acceptable values are:
* :obj:`'tf'`: Return TensorFlow :obj:`tf.constant` objects. - `'tf'`: Return TensorFlow `tf.constant` objects.
* :obj:`'pt'`: Return PyTorch :obj:`torch.Tensor` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects.
* :obj:`'np'`: Return Numpy :obj:`np.ndarray` objects. - `'np'`: Return Numpy `np.ndarray` objects.
sampling_rate (:obj:`int`, `optional`): sampling_rate (`int`, *optional*):
The sampling rate at which the :obj:`raw_speech` input was sampled. It is strongly recommended to pass The sampling rate at which the `raw_speech` input was sampled. It is strongly recommended to pass
:obj:`sampling_rate` at the forward call to prevent silent errors. `sampling_rate` at the forward call to prevent silent errors.
padding_value (:obj:`float`, defaults to 0.0): padding_value (`float`, defaults to 0.0):
The value that is used to fill the padding values / vectors. The value that is used to fill the padding values / vectors.
""" """
......
...@@ -26,17 +26,17 @@ class Speech2TextProcessor: ...@@ -26,17 +26,17 @@ class Speech2TextProcessor:
Constructs a Speech2Text processor which wraps a Speech2Text feature extractor and a Speech2Text tokenizer into a Constructs a Speech2Text processor which wraps a Speech2Text feature extractor and a Speech2Text tokenizer into a
single processor. single processor.
:class:`~transformers.Speech2TextProcessor` offers all the functionalities of [`Speech2TextProcessor`] offers all the functionalities of
:class:`~transformers.Speech2TextFeatureExtractor` and :class:`~transformers.Speech2TextTokenizer`. See the [`Speech2TextFeatureExtractor`] and [`Speech2TextTokenizer`]. See the
:meth:`~transformers.Speech2TextProcessor.__call__` and :meth:`~transformers.Speech2TextProcessor.decode` for more [`~Speech2TextProcessor.__call__`] and [`~Speech2TextProcessor.decode`] for more
information. information.
Args: Args:
feature_extractor (:obj:`Speech2TextFeatureExtractor`): feature_extractor (`Speech2TextFeatureExtractor`):
An instance of :class:`~transformers.Speech2TextFeatureExtractor`. The feature extractor is a required An instance of [`Speech2TextFeatureExtractor`]. The feature extractor is a required
input. input.
tokenizer (:obj:`Speech2TextTokenizer`): tokenizer (`Speech2TextTokenizer`):
An instance of :class:`~transformers.Speech2TextTokenizer`. The tokenizer is a required input. An instance of [`Speech2TextTokenizer`]. The tokenizer is a required input.
""" """
def __init__(self, feature_extractor, tokenizer): def __init__(self, feature_extractor, tokenizer):
...@@ -56,17 +56,19 @@ class Speech2TextProcessor: ...@@ -56,17 +56,19 @@ class Speech2TextProcessor:
def save_pretrained(self, save_directory): def save_pretrained(self, save_directory):
""" """
Save a Speech2Text feature extractor object and Speech2Text tokenizer object to the directory Save a Speech2Text feature extractor object and Speech2Text tokenizer object to the directory
``save_directory``, so that it can be re-loaded using the `save_directory`, so that it can be re-loaded using the
:func:`~transformers.Speech2TextProcessor.from_pretrained` class method. [`~Speech2TextProcessor.from_pretrained`] class method.
.. note:: <Tip>
This class method is simply calling :meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` and This class method is simply calling [`~PreTrainedFeatureExtractor.save_pretrained`] and
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.save_pretrained`. Please refer to the [`~tokenization_utils_base.PreTrainedTokenizer.save_pretrained`]. Please refer to the
docstrings of the methods above for more information. docstrings of the methods above for more information.
</Tip>
Args: Args:
save_directory (:obj:`str` or :obj:`os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will
be created if it does not exist). be created if it does not exist).
""" """
...@@ -77,30 +79,32 @@ class Speech2TextProcessor: ...@@ -77,30 +79,32 @@ class Speech2TextProcessor:
@classmethod @classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r""" r"""
Instantiate a :class:`~transformers.Speech2TextProcessor` from a pretrained Speech2Text processor. Instantiate a [`Speech2TextProcessor`] from a pretrained Speech2Text processor.
<Tip>
.. note:: This class method is simply calling Speech2TextFeatureExtractor's
[`~PreTrainedFeatureExtractor.from_pretrained`] and Speech2TextTokenizer's
[`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`]. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling Speech2TextFeatureExtractor's </Tip>
:meth:`~transformers.PreTrainedFeatureExtractor.from_pretrained` and Speech2TextTokenizer's
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.from_pretrained`. Please refer to the
docstrings of the methods above for more information.
Args: Args:
pretrained_model_name_or_path (:obj:`str` or :obj:`os.PathLike`): pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either: This can be either:
- a string, the `model id` of a pretrained feature_extractor hosted inside a model repo on - a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like ``bert-base-uncased``, or huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
namespaced under a user or organization name, like ``dbmdz/bert-base-german-cased``. namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a `directory` containing a feature extractor file saved using the - a path to a *directory* containing a feature extractor file saved using the
:meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` method, e.g., [`~PreTrainedFeatureExtractor.save_pretrained`] method, e.g.,
``./my_model_directory/``. `./my_model_directory/`.
- a path or url to a saved feature extractor JSON `file`, e.g., - a path or url to a saved feature extractor JSON *file*, e.g.,
``./my_model_directory/preprocessor_config.json``. `./my_model_directory/preprocessor_config.json`.
**kwargs **kwargs
Additional keyword arguments passed along to both :class:`~transformers.PreTrainedFeatureExtractor` and Additional keyword arguments passed along to both [`PreTrainedFeatureExtractor`] and
:class:`~transformers.PreTrainedTokenizer` [`PreTrainedTokenizer`]
""" """
feature_extractor = Speech2TextFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs) feature_extractor = Speech2TextFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs)
tokenizer = Speech2TextTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs) tokenizer = Speech2TextTokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
...@@ -110,9 +114,9 @@ class Speech2TextProcessor: ...@@ -110,9 +114,9 @@ class Speech2TextProcessor:
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
""" """
When used in normal mode, this method forwards all its arguments to Speech2TextFeatureExtractor's When used in normal mode, this method forwards all its arguments to Speech2TextFeatureExtractor's
:meth:`~transformers.Speech2TextFeatureExtractor.__call__` and returns its output. If used in the context [`~Speech2TextFeatureExtractor.__call__`] and returns its output. If used in the context
:meth:`~transformers.Speech2TextProcessor.as_target_processor` this method forwards all its arguments to [`~Speech2TextProcessor.as_target_processor`] this method forwards all its arguments to
Speech2TextTokenizer's :meth:`~transformers.Speech2TextTokenizer.__call__`. Please refer to the doctsring of Speech2TextTokenizer's [`~Speech2TextTokenizer.__call__`]. Please refer to the doctsring of
the above two methods for more information. the above two methods for more information.
""" """
return self.current_processor(*args, **kwargs) return self.current_processor(*args, **kwargs)
...@@ -120,7 +124,7 @@ class Speech2TextProcessor: ...@@ -120,7 +124,7 @@ class Speech2TextProcessor:
def batch_decode(self, *args, **kwargs): def batch_decode(self, *args, **kwargs):
""" """
This method forwards all its arguments to Speech2TextTokenizer's This method forwards all its arguments to Speech2TextTokenizer's
:meth:`~transformers.PreTrainedTokenizer.batch_decode`. Please refer to the docstring of this method for more [`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more
information. information.
""" """
return self.tokenizer.batch_decode(*args, **kwargs) return self.tokenizer.batch_decode(*args, **kwargs)
...@@ -128,7 +132,7 @@ class Speech2TextProcessor: ...@@ -128,7 +132,7 @@ class Speech2TextProcessor:
def decode(self, *args, **kwargs): def decode(self, *args, **kwargs):
""" """
This method forwards all its arguments to Speech2TextTokenizer's This method forwards all its arguments to Speech2TextTokenizer's
:meth:`~transformers.PreTrainedTokenizer.decode`. Please refer to the docstring of this method for more [`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more
information. information.
""" """
return self.tokenizer.decode(*args, **kwargs) return self.tokenizer.decode(*args, **kwargs)
......
...@@ -56,46 +56,45 @@ class Speech2TextTokenizer(PreTrainedTokenizer): ...@@ -56,46 +56,45 @@ class Speech2TextTokenizer(PreTrainedTokenizer):
""" """
Construct an Speech2Text tokenizer. Construct an Speech2Text tokenizer.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains some of the main methods. This tokenizer inherits from [`PreTrainedTokenizer`] which contains some of the main methods.
Users should refer to the superclass for more information regarding such methods. Users should refer to the superclass for more information regarding such methods.
Args: Args:
vocab_file (:obj:`str`): vocab_file (`str`):
File containing the vocabulary. File containing the vocabulary.
spm_file (:obj:`str`): spm_file (`str`):
Path to the `SentencePiece <https://github.com/google/sentencepiece>`__ model file Path to the [SentencePiece](https://github.com/google/sentencepiece) model file
bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`): bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sentence token. The beginning of sentence token.
eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`): eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sentence token. The end of sentence token.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`): unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`): pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
do_upper_case (:obj:`bool`, `optional`, defaults to :obj:`False`): do_upper_case (`bool`, *optional*, defaults to `False`):
Whether or not to uppercase the output when decoding. Whether or not to uppercase the output when decoding.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`False`): do_lower_case (`bool`, *optional*, defaults to `False`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
tgt_lang (:obj:`str`, `optional`): tgt_lang (`str`, *optional*):
A string representing the target language. A string representing the target language.
sp_model_kwargs (:obj:`dict`, `optional`): sp_model_kwargs (`dict`, *optional*):
Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set:
<https://github.com/google/sentencepiece/tree/master/python>`__ can be used, among other things, to set:
- ``enable_sampling``: Enable subword regularization. - `enable_sampling`: Enable subword regularization.
- ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout. - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.
- ``nbest_size = {0,1}``: No sampling is performed. - `nbest_size = {0,1}`: No sampling is performed.
- ``nbest_size > 1``: samples from the nbest_size results. - `nbest_size > 1`: samples from the nbest_size results.
- ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
using forward-filtering-and-backward-sampling algorithm. using forward-filtering-and-backward-sampling algorithm.
- ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
BPE-dropout. BPE-dropout.
**kwargs **kwargs
Additional keyword arguments passed along to :class:`~transformers.PreTrainedTokenizer` Additional keyword arguments passed along to [`PreTrainedTokenizer`]
""" """
vocab_files_names = VOCAB_FILES_NAMES vocab_files_names = VOCAB_FILES_NAMES
...@@ -203,18 +202,18 @@ class Speech2TextTokenizer(PreTrainedTokenizer): ...@@ -203,18 +202,18 @@ class Speech2TextTokenizer(PreTrainedTokenizer):
) -> List[int]: ) -> List[int]:
""" """
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer ``prepare_for_model`` method. special tokens using the tokenizer `prepare_for_model` method.
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (`List[int]`):
List of IDs. List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs. Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`): already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model. Whether or not the token list is already formatted with special tokens for the model.
Returns: Returns:
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token. `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
""" """
if already_has_special_tokens: if already_has_special_tokens:
......
...@@ -28,65 +28,65 @@ SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -28,65 +28,65 @@ SPEECH_TO_TEXT_2_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class Speech2Text2Config(PretrainedConfig): class Speech2Text2Config(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.Speech2Text2ForCausalLM`. It This is the configuration class to store the configuration of a [`Speech2Text2ForCausalLM`]. It
is used to instantiate an Speech2Text2 model according to the specified arguments, defining the model architecture. is used to instantiate an Speech2Text2 model according to the specified arguments, defining the model architecture.
Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text2 Instantiating a configuration with the defaults will yield a similar configuration to that of the Speech2Text2
`facebook/s2t-small-librispeech-asr <https://huggingface.co/facebook/s2t-small-librispeech-asr>`__ architecture. [facebook/s2t-small-librispeech-asr](https://huggingface.co/facebook/s2t-small-librispeech-asr) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 50265): vocab_size (`int`, *optional*, defaults to 50265):
Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by Vocabulary size of the Speech2Text model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.Speech2TextModel` the `inputs_ids` passed when calling [`Speech2TextModel`]
d_model (:obj:`int`, `optional`, defaults to 1024): d_model (`int`, *optional*, defaults to 1024):
Dimensionality of the layers and the pooler layer. Dimensionality of the layers and the pooler layer.
decoder_layers (:obj:`int`, `optional`, defaults to 12): decoder_layers (`int`, *optional*, defaults to 12):
Number of decoder layers. Number of decoder layers.
decoder_attention_heads (:obj:`int`, `optional`, defaults to 16): decoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder. Number of attention heads for each attention layer in the Transformer decoder.
decoder_ffn_dim (:obj:`int`, `optional`, defaults to 4096): decoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder. Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`): activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the pooler. If string, :obj:`"gelu"`, The non-linear activation function (function or string) in the pooler. If string, `"gelu"`,
:obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported. `"relu"`, `"silu"` and `"gelu_new"` are supported.
dropout (:obj:`float`, `optional`, defaults to 0.1): dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, and pooler. The dropout probability for all fully connected layers in the embeddings, and pooler.
attention_dropout (:obj:`float`, `optional`, defaults to 0.0): attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
activation_dropout (:obj:`float`, `optional`, defaults to 0.0): activation_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for activations inside the fully connected layer. The dropout ratio for activations inside the fully connected layer.
classifier_dropout (:obj:`float`, `optional`, defaults to 0.0): classifier_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for classifier. The dropout ratio for classifier.
init_std (:obj:`float`, `optional`, defaults to 0.02): init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
https://arxiv.org/abs/1909.11556>`__ for more details. https://arxiv.org/abs/1909.11556>`__ for more details. decoder_layerdrop: (`float`, *optional*, defaults to 0.0):
decoder_layerdrop: (:obj:`float`, `optional`, defaults to 0.0): The LayerDrop probability for the decoder. See the [LayerDrop paper](see
The LayerDrop probability for the decoder. See the `LayerDrop paper <see https://arxiv.org/abs/1909.11556) for more details.
https://arxiv.org/abs/1909.11556>`__ for more details. use_cache (`bool`, *optional*, defaults to `True`):
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
max_source_positions (:obj:`int`, `optional`, defaults to 6000): max_source_positions (`int`, *optional*, defaults to 6000):
The maximum sequence length of log-mel filter-bank features that this model might ever be used with. The maximum sequence length of log-mel filter-bank features that this model might ever be used with.
max_target_positions: (:obj:`int`, `optional`, defaults to 1024): max_target_positions: (`int`, *optional*, defaults to 1024):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
Example:: Example:
>>> from transformers import Speech2Text2ForCausalLM, Speech2Text2Config ```python
>>> from transformers import Speech2Text2ForCausalLM, Speech2Text2Config
>>> # Initializing a Speech2Text2 s2t_transformer_s style configuration >>> # Initializing a Speech2Text2 s2t_transformer_s style configuration
>>> configuration = Speech2Text2Config() >>> configuration = Speech2Text2Config()
>>> # Initializing a model from the s2t_transformer_s style configuration >>> # Initializing a model from the s2t_transformer_s style configuration
>>> model = Speech2Text2ForCausalLM(configuration) >>> model = Speech2Text2ForCausalLM(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
""" ```"""
model_type = "speech_to_text_2" model_type = "speech_to_text_2"
keys_to_ignore_at_inference = ["past_key_values"] keys_to_ignore_at_inference = ["past_key_values"]
attribute_map = {"num_attention_heads": "decoder_attention_heads", "hidden_size": "d_model"} attribute_map = {"num_attention_heads": "decoder_attention_heads", "hidden_size": "d_model"}
......
...@@ -27,16 +27,16 @@ class Speech2Text2Processor: ...@@ -27,16 +27,16 @@ class Speech2Text2Processor:
Constructs a Speech2Text2 processor which wraps a Speech2Text2 feature extractor and a Speech2Text2 tokenizer into Constructs a Speech2Text2 processor which wraps a Speech2Text2 feature extractor and a Speech2Text2 tokenizer into
a single processor. a single processor.
:class:`~transformers.Speech2Text2Processor` offers all the functionalities of [`Speech2Text2Processor`] offers all the functionalities of
:class:`~transformers.AutoFeatureExtractor` and :class:`~transformers.Speech2Text2Tokenizer`. See the [`AutoFeatureExtractor`] and [`Speech2Text2Tokenizer`]. See the
:meth:`~transformers.Speech2Text2Processor.__call__` and :meth:`~transformers.Speech2Text2Processor.decode` for [`~Speech2Text2Processor.__call__`] and [`~Speech2Text2Processor.decode`] for
more information. more information.
Args: Args:
feature_extractor (:obj:`AutoFeatureExtractor`): feature_extractor (`AutoFeatureExtractor`):
An instance of :class:`~transformers.AutoFeatureExtractor`. The feature extractor is a required input. An instance of [`AutoFeatureExtractor`]. The feature extractor is a required input.
tokenizer (:obj:`Speech2Text2Tokenizer`): tokenizer (`Speech2Text2Tokenizer`):
An instance of :class:`~transformers.Speech2Text2Tokenizer`. The tokenizer is a required input. An instance of [`Speech2Text2Tokenizer`]. The tokenizer is a required input.
""" """
def __init__(self, feature_extractor, tokenizer): def __init__(self, feature_extractor, tokenizer):
...@@ -56,17 +56,19 @@ class Speech2Text2Processor: ...@@ -56,17 +56,19 @@ class Speech2Text2Processor:
def save_pretrained(self, save_directory): def save_pretrained(self, save_directory):
""" """
Save a Speech2Text2 feature extractor object and Speech2Text2 tokenizer object to the directory Save a Speech2Text2 feature extractor object and Speech2Text2 tokenizer object to the directory
``save_directory``, so that it can be re-loaded using the `save_directory`, so that it can be re-loaded using the
:func:`~transformers.Speech2Text2Processor.from_pretrained` class method. [`~Speech2Text2Processor.from_pretrained`] class method.
.. note:: <Tip>
This class method is simply calling :meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` and This class method is simply calling [`~PreTrainedFeatureExtractor.save_pretrained`] and
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.save_pretrained`. Please refer to the [`~tokenization_utils_base.PreTrainedTokenizer.save_pretrained`]. Please refer to the
docstrings of the methods above for more information. docstrings of the methods above for more information.
</Tip>
Args: Args:
save_directory (:obj:`str` or :obj:`os.PathLike`): save_directory (`str` or `os.PathLike`):
Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will Directory where the feature extractor JSON file and the tokenizer files will be saved (directory will
be created if it does not exist). be created if it does not exist).
""" """
...@@ -77,30 +79,32 @@ class Speech2Text2Processor: ...@@ -77,30 +79,32 @@ class Speech2Text2Processor:
@classmethod @classmethod
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r""" r"""
Instantiate a :class:`~transformers.Speech2Text2Processor` from a pretrained Speech2Text2 processor. Instantiate a [`Speech2Text2Processor`] from a pretrained Speech2Text2 processor.
<Tip>
.. note:: This class method is simply calling AutoFeatureExtractor's
[`~PreTrainedFeatureExtractor.from_pretrained`] and Speech2Text2Tokenizer's
[`~tokenization_utils_base.PreTrainedTokenizer.from_pretrained`]. Please refer to the
docstrings of the methods above for more information.
This class method is simply calling AutoFeatureExtractor's </Tip>
:meth:`~transformers.PreTrainedFeatureExtractor.from_pretrained` and Speech2Text2Tokenizer's
:meth:`~transformers.tokenization_utils_base.PreTrainedTokenizer.from_pretrained`. Please refer to the
docstrings of the methods above for more information.
Args: Args:
pretrained_model_name_or_path (:obj:`str` or :obj:`os.PathLike`): pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either: This can be either:
- a string, the `model id` of a pretrained feature_extractor hosted inside a model repo on - a string, the *model id* of a pretrained feature_extractor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like ``bert-base-uncased``, or huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
namespaced under a user or organization name, like ``dbmdz/bert-base-german-cased``. namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a `directory` containing a feature extractor file saved using the - a path to a *directory* containing a feature extractor file saved using the
:meth:`~transformers.PreTrainedFeatureExtractor.save_pretrained` method, e.g., [`~PreTrainedFeatureExtractor.save_pretrained`] method, e.g.,
``./my_model_directory/``. `./my_model_directory/`.
- a path or url to a saved feature extractor JSON `file`, e.g., - a path or url to a saved feature extractor JSON *file*, e.g.,
``./my_model_directory/preprocessor_config.json``. `./my_model_directory/preprocessor_config.json`.
**kwargs **kwargs
Additional keyword arguments passed along to both :class:`~transformers.PreTrainedFeatureExtractor` and Additional keyword arguments passed along to both [`PreTrainedFeatureExtractor`] and
:class:`~transformers.PreTrainedTokenizer` [`PreTrainedTokenizer`]
""" """
feature_extractor = AutoFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs) feature_extractor = AutoFeatureExtractor.from_pretrained(pretrained_model_name_or_path, **kwargs)
tokenizer = Speech2Text2Tokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs) tokenizer = Speech2Text2Tokenizer.from_pretrained(pretrained_model_name_or_path, **kwargs)
...@@ -110,9 +114,9 @@ class Speech2Text2Processor: ...@@ -110,9 +114,9 @@ class Speech2Text2Processor:
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
""" """
When used in normal mode, this method forwards all its arguments to AutoFeatureExtractor's When used in normal mode, this method forwards all its arguments to AutoFeatureExtractor's
:meth:`~transformers.AutoFeatureExtractor.__call__` and returns its output. If used in the context [`~AutoFeatureExtractor.__call__`] and returns its output. If used in the context
:meth:`~transformers.Speech2Text2Processor.as_target_processor` this method forwards all its arguments to [`~Speech2Text2Processor.as_target_processor`] this method forwards all its arguments to
Speech2Text2Tokenizer's :meth:`~transformers.Speech2Text2Tokenizer.__call__`. Please refer to the doctsring of Speech2Text2Tokenizer's [`~Speech2Text2Tokenizer.__call__`]. Please refer to the doctsring of
the above two methods for more information. the above two methods for more information.
""" """
return self.current_processor(*args, **kwargs) return self.current_processor(*args, **kwargs)
...@@ -120,7 +124,7 @@ class Speech2Text2Processor: ...@@ -120,7 +124,7 @@ class Speech2Text2Processor:
def batch_decode(self, *args, **kwargs): def batch_decode(self, *args, **kwargs):
""" """
This method forwards all its arguments to Speech2Text2Tokenizer's This method forwards all its arguments to Speech2Text2Tokenizer's
:meth:`~transformers.PreTrainedTokenizer.batch_decode`. Please refer to the docstring of this method for more [`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more
information. information.
""" """
return self.tokenizer.batch_decode(*args, **kwargs) return self.tokenizer.batch_decode(*args, **kwargs)
...@@ -128,7 +132,7 @@ class Speech2Text2Processor: ...@@ -128,7 +132,7 @@ class Speech2Text2Processor:
def decode(self, *args, **kwargs): def decode(self, *args, **kwargs):
""" """
This method forwards all its arguments to Speech2Text2Tokenizer's This method forwards all its arguments to Speech2Text2Tokenizer's
:meth:`~transformers.PreTrainedTokenizer.decode`. Please refer to the docstring of this method for more [`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more
information. information.
""" """
return self.tokenizer.decode(*args, **kwargs) return self.tokenizer.decode(*args, **kwargs)
......
...@@ -68,24 +68,24 @@ class Speech2Text2Tokenizer(PreTrainedTokenizer): ...@@ -68,24 +68,24 @@ class Speech2Text2Tokenizer(PreTrainedTokenizer):
""" """
Constructs a Speech2Text2Tokenizer. Constructs a Speech2Text2Tokenizer.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains some of the main methods. This tokenizer inherits from [`PreTrainedTokenizer`] which contains some of the main methods.
Users should refer to the superclass for more information regarding such methods. Users should refer to the superclass for more information regarding such methods.
Args: Args:
vocab_file (:obj:`str`): vocab_file (`str`):
File containing the vocabulary. File containing the vocabulary.
bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`): bos_token (`str`, *optional*, defaults to `"<s>"`):
The beginning of sentence token. The beginning of sentence token.
eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`): eos_token (`str`, *optional*, defaults to `"</s>"`):
The end of sentence token. The end of sentence token.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`): unk_token (`str`, *optional*, defaults to `"<unk>"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`): pad_token (`str`, *optional*, defaults to `"<pad>"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
**kwargs **kwargs
Additional keyword arguments passed along to :class:`~transformers.PreTrainedTokenizer` Additional keyword arguments passed along to [`PreTrainedTokenizer`]
""" """
vocab_files_names = VOCAB_FILES_NAMES vocab_files_names = VOCAB_FILES_NAMES
......
...@@ -31,62 +31,62 @@ SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -31,62 +31,62 @@ SPLINTER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SplinterConfig(PretrainedConfig): class SplinterConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.SplinterModel`. It is used to This is the configuration class to store the configuration of a [`SplinterModel`]. It is used to
instantiate an Splinter model according to the specified arguments, defining the model architecture. Instantiating instantiate an Splinter model according to the specified arguments, defining the model architecture. Instantiating
a configuration with the defaults will yield a similar configuration to that of the Splinter `tau/splinter-base a configuration with the defaults will yield a similar configuration to that of the Splinter [tau/splinter-base](https://huggingface.co/tau/splinter-base) architecture.
<https://huggingface.co/tau/splinter-base>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522): vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the Splinter model. Defines the number of different tokens that can be represented by Vocabulary size of the Splinter model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.SplinterModel`. the `inputs_ids` passed when calling [`SplinterModel`].
hidden_size (:obj:`int`, `optional`, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimension of the encoder layers and the pooler layer. Dimension of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12): num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12): num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072): intermediate_size (`int`, *optional*, defaults to 3072):
Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. Dimension of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"gelu"`): hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"selu"` and :obj:`"gelu_new"` are supported. `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler. The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2): type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.SplinterModel`. The vocabulary size of the `token_type_ids` passed when calling [`SplinterModel`].
initializer_range (:obj:`float`, `optional`, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if ``config.is_decoder=True``. relevant if `config.is_decoder=True`.
question_token_id (:obj:`int`, `optional`, defaults to 104): question_token_id (`int`, *optional*, defaults to 104):
The id of the ``[QUESTION]`` token. The id of the `[QUESTION]` token.
Example:: Example:
>>> from transformers import SplinterModel, SplinterConfig ```python
>>> from transformers import SplinterModel, SplinterConfig
>>> # Initializing a Splinter tau/splinter-base style configuration >>> # Initializing a Splinter tau/splinter-base style configuration
>>> configuration = SplinterConfig() >>> configuration = SplinterConfig()
>>> # Initializing a model from the tau/splinter-base style configuration >>> # Initializing a model from the tau/splinter-base style configuration
>>> model = SplinterModel(configuration) >>> model = SplinterModel(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
""" ```"""
model_type = "splinter" model_type = "splinter"
def __init__( def __init__(
......
...@@ -76,44 +76,43 @@ class SplinterTokenizer(PreTrainedTokenizer): ...@@ -76,44 +76,43 @@ class SplinterTokenizer(PreTrainedTokenizer):
r""" r"""
Construct a Splinter tokenizer. Based on WordPiece. Construct a Splinter tokenizer. Based on WordPiece.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods. This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
Users should refer to this superclass for more information regarding those methods. Users should refer to this superclass for more information regarding those methods.
Args: Args:
vocab_file (:obj:`str`): vocab_file (`str`):
File containing the vocabulary. File containing the vocabulary.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`): do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
do_basic_tokenize (:obj:`bool`, `optional`, defaults to :obj:`True`): do_basic_tokenize (`bool`, *optional*, defaults to `True`):
Whether or not to do basic tokenization before WordPiece. Whether or not to do basic tokenization before WordPiece.
never_split (:obj:`Iterable`, `optional`): never_split (`Iterable`, *optional*):
Collection of tokens which will never be split during tokenization. Only has an effect when Collection of tokens which will never be split during tokenization. Only has an effect when
:obj:`do_basic_tokenize=True` `do_basic_tokenize=True`
unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`): unk_token (`str`, *optional*, defaults to `"[UNK]"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`): sep_token (`str`, *optional*, defaults to `"[SEP]"`):
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
sequence classification or for a text and a question for question answering. It is also used as the last sequence classification or for a text and a question for question answering. It is also used as the last
token of a sequence built with special tokens. token of a sequence built with special tokens.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`): pad_token (`str`, *optional*, defaults to `"[PAD]"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`): cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens. instead of per-token classification). It is the first token of the sequence when built with special tokens.
mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`): mask_token (`str`, *optional*, defaults to `"[MASK]"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
question_token (:obj:`str`, `optional`, defaults to :obj:`"[QUESTION]"`): question_token (`str`, *optional*, defaults to `"[QUESTION]"`):
The token used for constructing question representations. The token used for constructing question representations.
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. Whether or not to tokenize Chinese characters.
This should likely be deactivated for Japanese (see this `issue This should likely be deactivated for Japanese (see this [issue](https://github.com/huggingface/transformers/issues/328)).
<https://github.com/huggingface/transformers/issues/328>`__). strip_accents: (`bool`, *optional*):
strip_accents: (:obj:`bool`, `optional`):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
""" """
vocab_files_names = VOCAB_FILES_NAMES vocab_files_names = VOCAB_FILES_NAMES
...@@ -172,7 +171,7 @@ class SplinterTokenizer(PreTrainedTokenizer): ...@@ -172,7 +171,7 @@ class SplinterTokenizer(PreTrainedTokenizer):
@property @property
def question_token_id(self): def question_token_id(self):
""" """
:obj:`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question `Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
representation. representation.
""" """
return self.convert_tokens_to_ids(self.question_token) return self.convert_tokens_to_ids(self.question_token)
...@@ -222,17 +221,17 @@ class SplinterTokenizer(PreTrainedTokenizer): ...@@ -222,17 +221,17 @@ class SplinterTokenizer(PreTrainedTokenizer):
Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special
tokens. A Splinter sequence has the following format: tokens. A Splinter sequence has the following format:
- single sequence: ``[CLS] X [SEP]`` - single sequence: `[CLS] X [SEP]`
- pair of sequences for question answering: ``[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`` - pair of sequences for question answering: `[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (`List[int]`):
The question token IDs if pad_on_right, else context tokens IDs The question token IDs if pad_on_right, else context tokens IDs
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (`List[int]`, *optional*):
The context token IDs if pad_on_right, else question token IDs The context token IDs if pad_on_right, else question token IDs
Returns: Returns:
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens. `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
""" """
if token_ids_1 is None: if token_ids_1 is None:
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id] return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
...@@ -252,18 +251,18 @@ class SplinterTokenizer(PreTrainedTokenizer): ...@@ -252,18 +251,18 @@ class SplinterTokenizer(PreTrainedTokenizer):
) -> List[int]: ) -> List[int]:
""" """
Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
special tokens using the tokenizer ``prepare_for_model`` method. special tokens using the tokenizer `prepare_for_model` method.
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (`List[int]`):
List of IDs. List of IDs.
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (`List[int]`, *optional*):
Optional second list of IDs for sequence pairs. Optional second list of IDs for sequence pairs.
already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`): already_has_special_tokens (`bool`, *optional*, defaults to `False`):
Whether or not the token list is already formatted with special tokens for the model. Whether or not the token list is already formatted with special tokens for the model.
Returns: Returns:
:obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token. `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
""" """
if already_has_special_tokens: if already_has_special_tokens:
...@@ -279,17 +278,16 @@ class SplinterTokenizer(PreTrainedTokenizer): ...@@ -279,17 +278,16 @@ class SplinterTokenizer(PreTrainedTokenizer):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]: ) -> List[int]:
""" """
Create the token type IDs corresponding to the sequences passed. `What are token type IDs? Create the token type IDs corresponding to the sequences passed. [What are token type IDs?](../glossary#token-type-ids)
<../glossary.html#token-type-ids>`__
Should be overridden in a subclass if the model has a special way of building those. Should be overridden in a subclass if the model has a special way of building those.
Args: Args:
token_ids_0 (:obj:`List[int]`): The first tokenized sequence. token_ids_0 (`List[int]`): The first tokenized sequence.
token_ids_1 (:obj:`List[int]`, `optional`): The second tokenized sequence. token_ids_1 (`List[int]`, *optional*): The second tokenized sequence.
Returns: Returns:
:obj:`List[int]`: The token type ids. `List[int]`: The token type ids.
""" """
sep = [self.sep_token_id] sep = [self.sep_token_id]
cls = [self.cls_token_id] cls = [self.cls_token_id]
...@@ -330,19 +328,18 @@ class BasicTokenizer(object): ...@@ -330,19 +328,18 @@ class BasicTokenizer(object):
Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.). Constructs a BasicTokenizer that will run basic tokenization (punctuation splitting, lower casing, etc.).
Args: Args:
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`): do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
never_split (:obj:`Iterable`, `optional`): never_split (`Iterable`, *optional*):
Collection of tokens which will never be split during tokenization. Only has an effect when Collection of tokens which will never be split during tokenization. Only has an effect when
:obj:`do_basic_tokenize=True` `do_basic_tokenize=True`
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. Whether or not to tokenize Chinese characters.
This should likely be deactivated for Japanese (see this `issue This should likely be deactivated for Japanese (see this [issue](https://github.com/huggingface/transformers/issues/328)).
<https://github.com/huggingface/transformers/issues/328>`__). strip_accents: (`bool`, *optional*):
strip_accents: (:obj:`bool`, `optional`):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
""" """
def __init__(self, do_lower_case=True, never_split=None, tokenize_chinese_chars=True, strip_accents=None): def __init__(self, do_lower_case=True, never_split=None, tokenize_chinese_chars=True, strip_accents=None):
...@@ -359,9 +356,9 @@ class BasicTokenizer(object): ...@@ -359,9 +356,9 @@ class BasicTokenizer(object):
WordPieceTokenizer. WordPieceTokenizer.
Args: Args:
**never_split**: (`optional`) list of str **never_split**: (*optional*) list of str
Kept for backward compatibility purposes. Now implemented directly at the base class level (see Kept for backward compatibility purposes. Now implemented directly at the base class level (see
:func:`PreTrainedTokenizer.tokenize`) List of token not to split. [`PreTrainedTokenizer.tokenize`]) List of token not to split.
""" """
# union() returns a new set by concatenating the two sets. # union() returns a new set by concatenating the two sets.
never_split = self.never_split.union(set(never_split)) if never_split else self.never_split never_split = self.never_split.union(set(never_split)) if never_split else self.never_split
...@@ -487,11 +484,11 @@ class WordpieceTokenizer(object): ...@@ -487,11 +484,11 @@ class WordpieceTokenizer(object):
Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform
tokenization using the given vocabulary. tokenization using the given vocabulary.
For example, :obj:`input = "unaffable"` wil return as output :obj:`["un", "##aff", "##able"]`. For example, `input = "unaffable"` wil return as output `["un", "##aff", "##able"]`.
Args: Args:
text: A single token or whitespace separated tokens. This should have text: A single token or whitespace separated tokens. This should have
already been passed through `BasicTokenizer`. already been passed through *BasicTokenizer*.
Returns: Returns:
A list of wordpiece tokens. A list of wordpiece tokens.
......
...@@ -54,43 +54,43 @@ PRETRAINED_INIT_CONFIGURATION = { ...@@ -54,43 +54,43 @@ PRETRAINED_INIT_CONFIGURATION = {
class SplinterTokenizerFast(PreTrainedTokenizerFast): class SplinterTokenizerFast(PreTrainedTokenizerFast):
r""" r"""
Construct a "fast" Splinter tokenizer (backed by HuggingFace's `tokenizers` library). Based on WordPiece. Construct a "fast" Splinter tokenizer (backed by HuggingFace's *tokenizers* library). Based on WordPiece.
This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
methods. Users should refer to this superclass for more information regarding those methods. methods. Users should refer to this superclass for more information regarding those methods.
Args: Args:
vocab_file (:obj:`str`): vocab_file (`str`):
File containing the vocabulary. File containing the vocabulary.
do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`): do_lower_case (`bool`, *optional*, defaults to `True`):
Whether or not to lowercase the input when tokenizing. Whether or not to lowercase the input when tokenizing.
unk_token (:obj:`str`, `optional`, defaults to :obj:`"[UNK]"`): unk_token (`str`, *optional*, defaults to `"[UNK]"`):
The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
token instead. token instead.
sep_token (:obj:`str`, `optional`, defaults to :obj:`"[SEP]"`): sep_token (`str`, *optional*, defaults to `"[SEP]"`):
The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
sequence classification or for a text and a question for question answering. It is also used as the last sequence classification or for a text and a question for question answering. It is also used as the last
token of a sequence built with special tokens. token of a sequence built with special tokens.
pad_token (:obj:`str`, `optional`, defaults to :obj:`"[PAD]"`): pad_token (`str`, *optional*, defaults to `"[PAD]"`):
The token used for padding, for example when batching sequences of different lengths. The token used for padding, for example when batching sequences of different lengths.
cls_token (:obj:`str`, `optional`, defaults to :obj:`"[CLS]"`): cls_token (`str`, *optional*, defaults to `"[CLS]"`):
The classifier token which is used when doing sequence classification (classification of the whole sequence The classifier token which is used when doing sequence classification (classification of the whole sequence
instead of per-token classification). It is the first token of the sequence when built with special tokens. instead of per-token classification). It is the first token of the sequence when built with special tokens.
mask_token (:obj:`str`, `optional`, defaults to :obj:`"[MASK]"`): mask_token (`str`, *optional*, defaults to `"[MASK]"`):
The token used for masking values. This is the token used when training this model with masked language The token used for masking values. This is the token used when training this model with masked language
modeling. This is the token which the model will try to predict. modeling. This is the token which the model will try to predict.
question_token (:obj:`str`, `optional`, defaults to :obj:`"[QUESTION]"`): question_token (`str`, *optional*, defaults to `"[QUESTION]"`):
The token used for constructing question representations. The token used for constructing question representations.
clean_text (:obj:`bool`, `optional`, defaults to :obj:`True`): clean_text (`bool`, *optional*, defaults to `True`):
Whether or not to clean the text before tokenization by removing any control characters and replacing all Whether or not to clean the text before tokenization by removing any control characters and replacing all
whitespaces by the classic one. whitespaces by the classic one.
tokenize_chinese_chars (:obj:`bool`, `optional`, defaults to :obj:`True`): tokenize_chinese_chars (`bool`, *optional*, defaults to `True`):
Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see `this Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this
issue <https://github.com/huggingface/transformers/issues/328>`__). issue](https://github.com/huggingface/transformers/issues/328)).
strip_accents: (:obj:`bool`, `optional`): strip_accents: (`bool`, *optional*):
Whether or not to strip all accents. If this option is not specified, then it will be determined by the Whether or not to strip all accents. If this option is not specified, then it will be determined by the
value for :obj:`lowercase` (as in the original BERT). value for `lowercase` (as in the original BERT).
wordpieces_prefix: (:obj:`str`, `optional`, defaults to :obj:`"##"`): wordpieces_prefix: (`str`, *optional*, defaults to `"##"`):
The prefix for subwords. The prefix for subwords.
""" """
...@@ -145,7 +145,7 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast): ...@@ -145,7 +145,7 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
@property @property
def question_token_id(self): def question_token_id(self):
""" """
:obj:`Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question `Optional[int]`: Id of the question token in the vocabulary, used to condition the answer on a question
representation. representation.
""" """
return self.convert_tokens_to_ids(self.question_token) return self.convert_tokens_to_ids(self.question_token)
...@@ -157,17 +157,17 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast): ...@@ -157,17 +157,17 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special Build model inputs from a pair of sequence for question answering tasks by concatenating and adding special
tokens. A Splinter sequence has the following format: tokens. A Splinter sequence has the following format:
- single sequence: ``[CLS] X [SEP]`` - single sequence: `[CLS] X [SEP]`
- pair of sequences for question answering: ``[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`` - pair of sequences for question answering: `[CLS] question_tokens [QUESTION] . [SEP] context_tokens [SEP]`
Args: Args:
token_ids_0 (:obj:`List[int]`): token_ids_0 (`List[int]`):
The question token IDs if pad_on_right, else context tokens IDs The question token IDs if pad_on_right, else context tokens IDs
token_ids_1 (:obj:`List[int]`, `optional`): token_ids_1 (`List[int]`, *optional*):
The context token IDs if pad_on_right, else question token IDs The context token IDs if pad_on_right, else question token IDs
Returns: Returns:
:obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens. `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
""" """
if token_ids_1 is None: if token_ids_1 is None:
return [self.cls_token_id] + token_ids_0 + [self.sep_token_id] return [self.cls_token_id] + token_ids_0 + [self.sep_token_id]
...@@ -186,17 +186,16 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast): ...@@ -186,17 +186,16 @@ class SplinterTokenizerFast(PreTrainedTokenizerFast):
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
) -> List[int]: ) -> List[int]:
""" """
Create the token type IDs corresponding to the sequences passed. `What are token type IDs? Create the token type IDs corresponding to the sequences passed. [What are token type IDs?](../glossary#token-type-ids)
<../glossary.html#token-type-ids>`__
Should be overridden in a subclass if the model has a special way of building those. Should be overridden in a subclass if the model has a special way of building those.
Args: Args:
token_ids_0 (:obj:`List[int]`): The first tokenized sequence. token_ids_0 (`List[int]`): The first tokenized sequence.
token_ids_1 (:obj:`List[int]`, `optional`): The second tokenized sequence. token_ids_1 (`List[int]`, *optional*): The second tokenized sequence.
Returns: Returns:
:obj:`List[int]`: The token type ids. `List[int]`: The token type ids.
""" """
sep = [self.sep_token_id] sep = [self.sep_token_id]
cls = [self.cls_token_id] cls = [self.cls_token_id]
......
...@@ -29,72 +29,74 @@ SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -29,72 +29,74 @@ SQUEEZEBERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class SqueezeBertConfig(PretrainedConfig): class SqueezeBertConfig(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.SqueezeBertModel`. It is used This is the configuration class to store the configuration of a [`SqueezeBertModel`]. It is used
to instantiate a SqueezeBERT model according to the specified arguments, defining the model architecture. to instantiate a SqueezeBERT model according to the specified arguments, defining the model architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Args: Args:
vocab_size (:obj:`int`, `optional`, defaults to 30522): vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the SqueezeBERT model. Defines the number of different tokens that can be represented by Vocabulary size of the SqueezeBERT model. Defines the number of different tokens that can be represented by
the :obj:`inputs_ids` passed when calling :class:`~transformers.SqueezeBertModel`. the `inputs_ids` passed when calling [`SqueezeBertModel`].
hidden_size (:obj:`int`, `optional`, defaults to 768): hidden_size (`int`, *optional*, defaults to 768):
Dimensionality of the encoder layers and the pooler layer. Dimensionality of the encoder layers and the pooler layer.
num_hidden_layers (:obj:`int`, `optional`, defaults to 12): num_hidden_layers (`int`, *optional*, defaults to 12):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_attention_heads (:obj:`int`, `optional`, defaults to 12): num_attention_heads (`int`, *optional*, defaults to 12):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
intermediate_size (:obj:`int`, `optional`, defaults to 3072): intermediate_size (`int`, *optional*, defaults to 3072):
Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder. Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`): hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"silu"` and :obj:`"gelu_new"` are supported. `"gelu"`, `"relu"`, `"silu"` and `"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0.1): attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities. The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512): max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048). just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2): type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.BertModel` or The vocabulary size of the `token_type_ids` passed when calling [`BertModel`] or
:class:`~transformers.TFBertModel`. [`TFBertModel`].
initializer_range (:obj:`float`, `optional`, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12): layer_norm_eps (`float`, *optional*, defaults to 1e-12):
pad_token_id (:obj:`int`, `optional`, defaults to 0): pad_token_id (`int`, *optional*, defaults to 0):
The ID of the token in the word embedding to use as padding. The ID of the token in the word embedding to use as padding.
embedding_size (:obj:`int`, `optional`, defaults to 768): embedding_size (`int`, *optional*, defaults to 768):
The dimension of the word embedding vectors. The dimension of the word embedding vectors.
q_groups (:obj:`int`, `optional`, defaults to 4): q_groups (`int`, *optional*, defaults to 4):
The number of groups in Q layer. The number of groups in Q layer.
k_groups (:obj:`int`, `optional`, defaults to 4): k_groups (`int`, *optional*, defaults to 4):
The number of groups in K layer. The number of groups in K layer.
v_groups (:obj:`int`, `optional`, defaults to 4): v_groups (`int`, *optional*, defaults to 4):
The number of groups in V layer. The number of groups in V layer.
post_attention_groups (:obj:`int`, `optional`, defaults to 1): post_attention_groups (`int`, *optional*, defaults to 1):
The number of groups in the first feed forward network layer. The number of groups in the first feed forward network layer.
intermediate_groups (:obj:`int`, `optional`, defaults to 4): intermediate_groups (`int`, *optional*, defaults to 4):
The number of groups in the second feed forward network layer. The number of groups in the second feed forward network layer.
output_groups (:obj:`int`, `optional`, defaults to 4): output_groups (`int`, *optional*, defaults to 4):
The number of groups in the third feed forward network layer. The number of groups in the third feed forward network layer.
Examples:: Examples:
>>> from transformers import SqueezeBertModel, SqueezeBertConfig ```python
>>> from transformers import SqueezeBertModel, SqueezeBertConfig
>>> # Initializing a SqueezeBERT configuration >>> # Initializing a SqueezeBERT configuration
>>> configuration = SqueezeBertConfig() >>> configuration = SqueezeBertConfig()
>>> # Initializing a model from the configuration above >>> # Initializing a model from the configuration above
>>> model = SqueezeBertModel(configuration) >>> model = SqueezeBertModel(configuration)
>>> # Accessing the model configuration >>> # Accessing the model configuration
>>> configuration = model.config >>> configuration = model.config
```
Attributes: pretrained_config_archive_map (Dict[str, str]): A dictionary containing all the available pre-trained Attributes: pretrained_config_archive_map (Dict[str, str]): A dictionary containing all the available pre-trained
checkpoints. checkpoints.
......
...@@ -48,10 +48,10 @@ class SqueezeBertTokenizer(BertTokenizer): ...@@ -48,10 +48,10 @@ class SqueezeBertTokenizer(BertTokenizer):
r""" r"""
Constructs a SqueezeBert tokenizer. Constructs a SqueezeBert tokenizer.
:class:`~transformers.SqueezeBertTokenizer is identical to :class:`~transformers.BertTokenizer` and runs end-to-end [`SqueezeBertTokenizer`] is identical to [`BertTokenizer`] and runs end-to-end
tokenization: punctuation splitting + wordpiece. tokenization: punctuation splitting + wordpiece.
Refer to superclass :class:`~transformers.BertTokenizer` for usage examples and documentation concerning Refer to superclass [`BertTokenizer`] for usage examples and documentation concerning
parameters. parameters.
""" """
......
...@@ -52,12 +52,12 @@ PRETRAINED_INIT_CONFIGURATION = { ...@@ -52,12 +52,12 @@ PRETRAINED_INIT_CONFIGURATION = {
class SqueezeBertTokenizerFast(BertTokenizerFast): class SqueezeBertTokenizerFast(BertTokenizerFast):
r""" r"""
Constructs a "Fast" SqueezeBert tokenizer (backed by HuggingFace's `tokenizers` library). Constructs a "Fast" SqueezeBert tokenizer (backed by HuggingFace's *tokenizers* library).
:class:`~transformers.SqueezeBertTokenizerFast` is identical to :class:`~transformers.BertTokenizerFast` and runs [`SqueezeBertTokenizerFast`] is identical to [`BertTokenizerFast`] and runs
end-to-end tokenization: punctuation splitting + wordpiece. end-to-end tokenization: punctuation splitting + wordpiece.
Refer to superclass :class:`~transformers.BertTokenizerFast` for usage examples and documentation concerning Refer to superclass [`BertTokenizerFast`] for usage examples and documentation concerning
parameters. parameters.
""" """
......
...@@ -37,45 +37,44 @@ T5_PRETRAINED_CONFIG_ARCHIVE_MAP = { ...@@ -37,45 +37,44 @@ T5_PRETRAINED_CONFIG_ARCHIVE_MAP = {
class T5Config(PretrainedConfig): class T5Config(PretrainedConfig):
r""" r"""
This is the configuration class to store the configuration of a :class:`~transformers.T5Model` or a This is the configuration class to store the configuration of a [`T5Model`] or a
:class:`~transformers.TFT5Model`. It is used to instantiate a T5 model according to the specified arguments, [`TFT5Model`]. It is used to instantiate a T5 model according to the specified arguments,
defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration
to that of the T5 `t5-small <https://huggingface.co/t5-small>`__ architecture. to that of the T5 [t5-small](https://huggingface.co/t5-small) architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information. outputs. Read the documentation from [`PretrainedConfig`] for more information.
Arguments: Arguments:
vocab_size (:obj:`int`, `optional`, defaults to 32128): vocab_size (`int`, *optional*, defaults to 32128):
Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the
:obj:`inputs_ids` passed when calling :class:`~transformers.T5Model` or :class:`~transformers.TFT5Model`. `inputs_ids` passed when calling [`T5Model`] or [`TFT5Model`].
d_model (:obj:`int`, `optional`, defaults to 512): d_model (`int`, *optional*, defaults to 512):
Size of the encoder layers and the pooler layer. Size of the encoder layers and the pooler layer.
d_kv (:obj:`int`, `optional`, defaults to 64): d_kv (`int`, *optional*, defaults to 64):
Size of the key, query, value projections per attention head. :obj:`d_kv` has to be equal to :obj:`d_model Size of the key, query, value projections per attention head. `d_kv` has to be equal to `d_model // num_heads`.
// num_heads`. d_ff (`int`, *optional*, defaults to 2048):
d_ff (:obj:`int`, `optional`, defaults to 2048): Size of the intermediate feed forward layer in each `T5Block`.
Size of the intermediate feed forward layer in each :obj:`T5Block`. num_layers (`int`, *optional*, defaults to 6):
num_layers (:obj:`int`, `optional`, defaults to 6):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
num_decoder_layers (:obj:`int`, `optional`): num_decoder_layers (`int`, *optional*):
Number of hidden layers in the Transformer decoder. Will use the same value as :obj:`num_layers` if not Number of hidden layers in the Transformer decoder. Will use the same value as `num_layers` if not
set. set.
num_heads (:obj:`int`, `optional`, defaults to 8): num_heads (`int`, *optional*, defaults to 8):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
relative_attention_num_buckets (:obj:`int`, `optional`, defaults to 32): relative_attention_num_buckets (`int`, *optional*, defaults to 32):
The number of buckets to use for each attention layer. The number of buckets to use for each attention layer.
dropout_rate (:obj:`float`, `optional`, defaults to 0.1): dropout_rate (`float`, *optional*, defaults to 0.1):
The ratio for all dropout layers. The ratio for all dropout layers.
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-6): layer_norm_eps (`float`, *optional*, defaults to 1e-6):
The epsilon used by the layer normalization layers. The epsilon used by the layer normalization layers.
initializer_factor (:obj:`float`, `optional`, defaults to 1): initializer_factor (`float`, *optional*, defaults to 1):
A factor for initializing all weight matrices (should be kept to 1, used internally for initialization A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
testing). testing).
feed_forward_proj (:obj:`string`, `optional`, defaults to :obj:`"relu"`): feed_forward_proj (`string`, *optional*, defaults to `"relu"`):
Type of feed forward layer to be used. Should be one of :obj:`"relu"` or :obj:`"gated-gelu"`. T5v1.1 uses Type of feed forward layer to be used. Should be one of `"relu"` or `"gated-gelu"`. T5v1.1 uses
the :obj:`"gated-gelu"` feed forward projection. Original T5 uses :obj:`"relu"`. the `"gated-gelu"` feed forward projection. Original T5 uses `"relu"`.
use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Whether or not the model should return the last key/values attentions (not used by all models).
""" """
model_type = "t5" model_type = "t5"
......
...@@ -1054,17 +1054,18 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel): ...@@ -1054,17 +1054,18 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
r""" r"""
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration ```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small') >>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "My friends are cool but they eat too many carbs." >>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np') >>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs) >>> encoder_outputs = model.encode(**inputs)
""" ```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
...@@ -1114,24 +1115,25 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel): ...@@ -1114,24 +1115,25 @@ class FlaxT5PreTrainedModel(FlaxPreTrainedModel):
r""" r"""
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration ```python
>>> import jax.numpy as jnp >>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small') >>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "My friends are cool but they eat too many carbs." >>> text = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np') >>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs) >>> encoder_outputs = model.encode(**inputs)
>>> decoder_start_token_id = model.config.decoder_start_token_id >>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id >>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> outputs = model.decode(decoder_input_ids, encoder_outputs) >>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits >>> logits = outputs.logits
""" ```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
...@@ -1329,19 +1331,21 @@ append_call_sample_docstring( ...@@ -1329,19 +1331,21 @@ append_call_sample_docstring(
FLAX_T5_MODEL_DOCSTRING = """ FLAX_T5_MODEL_DOCSTRING = """
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, FlaxT5Model ```python
>>> from transformers import T5Tokenizer, FlaxT5Model
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5Model.from_pretrained('t5-small') >>> model = FlaxT5Model.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="np").input_ids >>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="np").input_ids
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="np").input_ids >>> decoder_input_ids = tokenizer("Studies show that", return_tensors="np").input_ids
>>> # forward pass >>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) >>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state >>> last_hidden_states = outputs.last_hidden_state
```
""" """
...@@ -1476,24 +1480,25 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel): ...@@ -1476,24 +1480,25 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel):
r""" r"""
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration ```python
>>> import jax.numpy as jnp >>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> import jax.numpy as jnp
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small') >>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> text = "summarize: My friends are cool but they eat too many carbs." >>> text = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer(text, return_tensors='np') >>> inputs = tokenizer(text, return_tensors='np')
>>> encoder_outputs = model.encode(**inputs) >>> encoder_outputs = model.encode(**inputs)
>>> decoder_start_token_id = model.config.decoder_start_token_id >>> decoder_start_token_id = model.config.decoder_start_token_id
>>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id >>> decoder_input_ids = jnp.ones((inputs.input_ids.shape[0], 1), dtype="i4") * decoder_start_token_id
>>> outputs = model.decode(decoder_input_ids, encoder_outputs) >>> outputs = model.decode(decoder_input_ids, encoder_outputs)
>>> logits = outputs.logits >>> logits = outputs.logits
""" ```"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = ( output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
...@@ -1624,19 +1629,21 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel): ...@@ -1624,19 +1629,21 @@ class FlaxT5ForConditionalGeneration(FlaxT5PreTrainedModel):
FLAX_T5_CONDITIONAL_GENERATION_DOCSTRING = """ FLAX_T5_CONDITIONAL_GENERATION_DOCSTRING = """
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration ```python
>>> from transformers import T5Tokenizer, FlaxT5ForConditionalGeneration
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small') >>> model = FlaxT5ForConditionalGeneration.from_pretrained('t5-small')
>>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs." >>> ARTICLE_TO_SUMMARIZE = "summarize: My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], return_tensors='np') >>> inputs = tokenizer([ARTICLE_TO_SUMMARIZE], return_tensors='np')
>>> # Generate Summary >>> # Generate Summary
>>> summary_ids = model.generate(inputs['input_ids']).sequences >>> summary_ids = model.generate(inputs['input_ids']).sequences
>>> print(tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)) >>> print(tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False))
```
""" """
......
...@@ -216,17 +216,19 @@ PARALLELIZE_DOCSTRING = r""" ...@@ -216,17 +216,19 @@ PARALLELIZE_DOCSTRING = r"""
DEPARALLELIZE_DOCSTRING = r""" DEPARALLELIZE_DOCSTRING = r"""
Moves the model to cpu from a model parallel state. Moves the model to cpu from a model parallel state.
Example:: Example:
# On a 4 GPU machine with t5-3b: ```python
model = T5ForConditionalGeneration.from_pretrained('t5-3b') # On a 4 GPU machine with t5-3b:
device_map = {0: [0, 1, 2], model = T5ForConditionalGeneration.from_pretrained('t5-3b')
device_map = {0: [0, 1, 2],
1: [3, 4, 5, 6, 7, 8, 9], 1: [3, 4, 5, 6, 7, 8, 9],
2: [10, 11, 12, 13, 14, 15, 16], 2: [10, 11, 12, 13, 14, 15, 16],
3: [17, 18, 19, 20, 21, 22, 23]} 3: [17, 18, 19, 20, 21, 22, 23]}
model.parallelize(device_map) # Splits the model across several devices model.parallelize(device_map) # Splits the model across several devices
model.deparallelize() # Put the model back on cpu and cleans memory by calling torch.cuda.empty_cache() model.deparallelize() # Put the model back on cpu and cleans memory by calling torch.cuda.empty_cache()
```
""" """
...@@ -1339,20 +1341,21 @@ class T5Model(T5PreTrainedModel): ...@@ -1339,20 +1341,21 @@ class T5Model(T5PreTrainedModel):
r""" r"""
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, T5Model ```python
>>> from transformers import T5Tokenizer, T5Model
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> model = T5Model.from_pretrained('t5-small') >>> model = T5Model.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1 >>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1 >>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1
>>> # forward pass >>> # forward pass
>>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) >>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
>>> last_hidden_states = outputs.last_hidden_state >>> last_hidden_states = outputs.last_hidden_state
""" ```"""
use_cache = use_cache if use_cache is not None else self.config.use_cache use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
...@@ -1790,15 +1793,16 @@ class T5EncoderModel(T5PreTrainedModel): ...@@ -1790,15 +1793,16 @@ class T5EncoderModel(T5PreTrainedModel):
r""" r"""
Returns: Returns:
Example:: Example:
>>> from transformers import T5Tokenizer, T5EncoderModel ```python
>>> tokenizer = T5Tokenizer.from_pretrained('t5-small') >>> from transformers import T5Tokenizer, T5EncoderModel
>>> model = T5EncoderModel.from_pretrained('t5-small') >>> tokenizer = T5Tokenizer.from_pretrained('t5-small')
>>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1 >>> model = T5EncoderModel.from_pretrained('t5-small')
>>> outputs = model(input_ids=input_ids) >>> input_ids = tokenizer("Studies have been shown that owning a dog is good for you", return_tensors="pt").input_ids # Batch size 1
>>> last_hidden_states = outputs.last_hidden_state >>> outputs = model(input_ids=input_ids)
""" >>> last_hidden_states = outputs.last_hidden_state
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict return_dict = return_dict if return_dict is not None else self.config.use_return_dict
encoder_outputs = self.encoder( encoder_outputs = self.encoder(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment