Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
a78565b7
Unverified
Commit
a78565b7
authored
Mar 15, 2022
by
Suraj Patil
Committed by
GitHub
Mar 15, 2022
Browse files
[MT5Config] add relative_attention_max_distance in config (#16170)
parent
4f4e5ddb
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
0 deletions
+4
-0
src/transformers/models/mt5/configuration_mt5.py
src/transformers/models/mt5/configuration_mt5.py
+4
-0
No files found.
src/transformers/models/mt5/configuration_mt5.py
View file @
a78565b7
...
@@ -50,6 +50,8 @@ class MT5Config(PretrainedConfig):
...
@@ -50,6 +50,8 @@ class MT5Config(PretrainedConfig):
Number of attention heads for each attention layer in the Transformer encoder.
Number of attention heads for each attention layer in the Transformer encoder.
relative_attention_num_buckets (`int`, *optional*, defaults to 32):
relative_attention_num_buckets (`int`, *optional*, defaults to 32):
The number of buckets to use for each attention layer.
The number of buckets to use for each attention layer.
relative_attention_max_distance (`int`, *optional*, defaults to 128):
The maximum distance of the longer sequences for the bucket separation.
dropout_rate (`float`, *optional*, defaults to 0.1):
dropout_rate (`float`, *optional*, defaults to 0.1):
The ratio for all dropout layers.
The ratio for all dropout layers.
layer_norm_eps (`float`, *optional*, defaults to 1e-6):
layer_norm_eps (`float`, *optional*, defaults to 1e-6):
...
@@ -75,6 +77,7 @@ class MT5Config(PretrainedConfig):
...
@@ -75,6 +77,7 @@ class MT5Config(PretrainedConfig):
num_decoder_layers
=
None
,
num_decoder_layers
=
None
,
num_heads
=
6
,
num_heads
=
6
,
relative_attention_num_buckets
=
32
,
relative_attention_num_buckets
=
32
,
relative_attention_max_distance
=
128
,
dropout_rate
=
0.1
,
dropout_rate
=
0.1
,
layer_norm_epsilon
=
1e-6
,
layer_norm_epsilon
=
1e-6
,
initializer_factor
=
1.0
,
initializer_factor
=
1.0
,
...
@@ -107,6 +110,7 @@ class MT5Config(PretrainedConfig):
...
@@ -107,6 +110,7 @@ class MT5Config(PretrainedConfig):
)
# default = symmetry
)
# default = symmetry
self
.
num_heads
=
num_heads
self
.
num_heads
=
num_heads
self
.
relative_attention_num_buckets
=
relative_attention_num_buckets
self
.
relative_attention_num_buckets
=
relative_attention_num_buckets
self
.
relative_attention_max_distance
=
relative_attention_max_distance
self
.
dropout_rate
=
dropout_rate
self
.
dropout_rate
=
dropout_rate
self
.
layer_norm_epsilon
=
layer_norm_epsilon
self
.
layer_norm_epsilon
=
layer_norm_epsilon
self
.
initializer_factor
=
initializer_factor
self
.
initializer_factor
=
initializer_factor
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment