Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
b29ebdf4
Unverified
Commit
b29ebdf4
authored
Oct 07, 2022
by
h
Committed by
GitHub
Oct 07, 2022
Browse files
removes prophet config dependencies from xlm-prophet (#19400)
parent
e162cebf
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
151 additions
and
7 deletions
+151
-7
src/transformers/models/xlm_prophetnet/configuration_xlm_prophetnet.py
...ers/models/xlm_prophetnet/configuration_xlm_prophetnet.py
+151
-7
No files found.
src/transformers/models/xlm_prophetnet/configuration_xlm_prophetnet.py
View file @
b29ebdf4
...
@@ -15,8 +15,10 @@
...
@@ -15,8 +15,10 @@
""" XLM-ProphetNet model configuration"""
""" XLM-ProphetNet model configuration"""
from
typing
import
Callable
,
Optional
,
Union
from
...configuration_utils
import
PretrainedConfig
from
...utils
import
logging
from
...utils
import
logging
from
..prophetnet.configuration_prophetnet
import
ProphetNetConfig
logger
=
logging
.
get_logger
(
__name__
)
logger
=
logging
.
get_logger
(
__name__
)
...
@@ -28,13 +30,155 @@ XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP = {
...
@@ -28,13 +30,155 @@ XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP = {
}
}
class
XLMProphetNetConfig
(
Pr
ophetNet
Config
):
class
XLMProphetNetConfig
(
Pr
etrained
Config
):
"""
r
"""
This
class overrides [`ProphetNetConfig`]. Please check the superclass for the appropriate docume
ntat
ion alongside
This
is the configuration class to store the configuration of a [`XLMProphetNetModel`]. It is used to insta
nt
i
at
e a
usage examples. Instantiating a configuration with the defaults will yield a similar configuration to that of the
XLMProphetNet model according to the specified arguments, defining the model architecture. Instantiating a
XLMProphetNet
configuration with the defaults will yield a similar configuration to that of the
XLMProphetNet
[microsoft/xprophetnet-large-wiki100-cased](https://huggingface.co/microsoft/xprophetnet-large-wiki100-cased)
[microsoft/xprophetnet-large-wiki100-cased](https://huggingface.co/microsoft/xprophetnet-large-wiki100-cased)
architecture.
architecture.
"""
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
activation_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for activations inside the fully connected layer.
activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
`"relu"`, `"silu"` and `"gelu_new"` are supported.
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the ProphetNET model. Defines the number of different tokens that can be represented by
the `inputs_ids` passed when calling [`XLMProphetNetModel`].
hidden_size (`int`, *optional*, defaults to 1024):
Dimensionality of the layers and the pooler layer.
encoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the "intermediate" (often named feed-forward) layer in decoder.
num_encoder_layers (`int`, *optional*, defaults to 12):
Number of encoder layers.
num_encoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer encoder.
decoder_ffn_dim (`int`, *optional*, defaults to 4096):
Dimensionality of the `intermediate` (often named feed-forward) layer in decoder.
num_decoder_layers (`int`, *optional*, defaults to 12):
Number of decoder layers.
num_decoder_attention_heads (`int`, *optional*, defaults to 16):
Number of attention heads for each attention layer in the Transformer decoder.
attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout ratio for the attention probabilities.
dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
max_position_embeddings (`int`, *optional*, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
add_cross_attention (`bool`, *optional*, defaults to `True`):
Whether cross-attention layers should be added to the model.
is_encoder_decoder (`bool`, *optional*, defaults to `True`):
Whether this is an encoder/decoder model.
pad_token_id (`int`, *optional*, defaults to 1)
Padding token id.
bos_token_id (`int`, *optional*, defaults to 0)
Beginning of stream token id.
eos_token_id (`int`, *optional*, defaults to 2)
End of stream token id.
ngram (`int`, *optional*, defaults to 2)
Number of future tokens to predict. Set to 1 to be same as traditional Language model to predict next first
token.
num_buckets (`int`, *optional*, defaults to 32)
The number of buckets to use for each attention layer. This is for relative position calculation. See the
[T5 paper](see https://arxiv.org/abs/1910.10683) for more details.
relative_max_distance (`int`, *optional*, defaults to 128)
Relative distances greater than this number will be put into the last same bucket. This is for relative
position calculation. See the [T5 paper](see https://arxiv.org/abs/1910.10683) for more details.
disable_ngram_loss (`bool`, *optional*, defaults to `False`):
Whether be trained predicting only the next first token.
eps (`float`, *optional*, defaults to 0.0):
Controls the `epsilon` parameter value for label smoothing in the loss calculation. If set to 0, no label
smoothing is performed.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models).
"""
model_type
=
"xlm-prophetnet"
model_type
=
"xlm-prophetnet"
keys_to_ignore_at_inference
=
[
"past_key_values"
]
attribute_map
=
{
"num_attention_heads"
:
"num_encoder_attention_heads"
,
}
def
__init__
(
self
,
activation_dropout
:
Optional
[
float
]
=
0.1
,
activation_function
:
Optional
[
Union
[
str
,
Callable
]]
=
"gelu"
,
vocab_size
:
Optional
[
int
]
=
30522
,
hidden_size
:
Optional
[
int
]
=
1024
,
encoder_ffn_dim
:
Optional
[
int
]
=
4096
,
num_encoder_layers
:
Optional
[
int
]
=
12
,
num_encoder_attention_heads
:
Optional
[
int
]
=
16
,
decoder_ffn_dim
:
Optional
[
int
]
=
4096
,
num_decoder_layers
:
Optional
[
int
]
=
12
,
num_decoder_attention_heads
:
Optional
[
int
]
=
16
,
attention_dropout
:
Optional
[
float
]
=
0.1
,
dropout
:
Optional
[
float
]
=
0.1
,
max_position_embeddings
:
Optional
[
int
]
=
512
,
init_std
:
Optional
[
float
]
=
0.02
,
is_encoder_decoder
:
Optional
[
bool
]
=
True
,
add_cross_attention
:
Optional
[
bool
]
=
True
,
decoder_start_token_id
:
Optional
[
int
]
=
0
,
ngram
:
Optional
[
int
]
=
2
,
num_buckets
:
Optional
[
int
]
=
32
,
relative_max_distance
:
Optional
[
int
]
=
128
,
disable_ngram_loss
:
Optional
[
bool
]
=
False
,
eps
:
Optional
[
float
]
=
0.0
,
use_cache
:
Optional
[
bool
]
=
True
,
pad_token_id
:
Optional
[
int
]
=
0
,
bos_token_id
:
Optional
[
int
]
=
1
,
eos_token_id
:
Optional
[
int
]
=
2
,
**
kwargs
):
self
.
vocab_size
=
vocab_size
self
.
hidden_size
=
hidden_size
self
.
encoder_ffn_dim
=
encoder_ffn_dim
self
.
num_encoder_layers
=
num_encoder_layers
self
.
num_encoder_attention_heads
=
num_encoder_attention_heads
self
.
decoder_ffn_dim
=
decoder_ffn_dim
self
.
num_decoder_layers
=
num_decoder_layers
self
.
num_decoder_attention_heads
=
num_decoder_attention_heads
self
.
max_position_embeddings
=
max_position_embeddings
self
.
init_std
=
init_std
# Normal(0, this parameter)
self
.
activation_function
=
activation_function
# parameters for xlmprophetnet
self
.
ngram
=
ngram
self
.
num_buckets
=
num_buckets
self
.
relative_max_distance
=
relative_max_distance
self
.
disable_ngram_loss
=
disable_ngram_loss
self
.
eps
=
eps
# 3 Types of Dropout
self
.
attention_dropout
=
attention_dropout
self
.
activation_dropout
=
activation_dropout
self
.
dropout
=
dropout
self
.
use_cache
=
use_cache
super
().
__init__
(
pad_token_id
=
pad_token_id
,
bos_token_id
=
bos_token_id
,
eos_token_id
=
eos_token_id
,
is_encoder_decoder
=
is_encoder_decoder
,
add_cross_attention
=
add_cross_attention
,
decoder_start_token_id
=
decoder_start_token_id
,
**
kwargs
,
)
@
property
def
num_hidden_layers
(
self
)
->
int
:
return
self
.
num_encoder_layers
+
self
.
num_decoder_layers
@
num_hidden_layers
.
setter
def
num_hidden_layers
(
self
,
value
):
raise
NotImplementedError
(
"This model does not support the setting of `num_hidden_layers`. Please set `num_encoder_layers` and"
" `num_decoder_layers`."
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment