Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
4f78bcb2
Unverified
Commit
4f78bcb2
authored
Dec 06, 2022
by
Steven Liu
Committed by
GitHub
Dec 06, 2022
Browse files
add missing is_decoder param (#20631)
parent
7586a1a3
Changes
15
Hide whitespace changes
Inline
Side-by-side
Showing
15 changed files
with
30 additions
and
0 deletions
+30
-0
src/transformers/models/bert/configuration_bert.py
src/transformers/models/bert/configuration_bert.py
+2
-0
src/transformers/models/bert_generation/configuration_bert_generation.py
...s/models/bert_generation/configuration_bert_generation.py
+2
-0
src/transformers/models/big_bird/configuration_big_bird.py
src/transformers/models/big_bird/configuration_big_bird.py
+2
-0
src/transformers/models/camembert/configuration_camembert.py
src/transformers/models/camembert/configuration_camembert.py
+2
-0
src/transformers/models/data2vec/configuration_data2vec_text.py
...ansformers/models/data2vec/configuration_data2vec_text.py
+2
-0
src/transformers/models/ernie/configuration_ernie.py
src/transformers/models/ernie/configuration_ernie.py
+2
-0
src/transformers/models/esm/configuration_esm.py
src/transformers/models/esm/configuration_esm.py
+2
-0
src/transformers/models/megatron_bert/configuration_megatron_bert.py
...rmers/models/megatron_bert/configuration_megatron_bert.py
+2
-0
src/transformers/models/nezha/configuration_nezha.py
src/transformers/models/nezha/configuration_nezha.py
+2
-0
src/transformers/models/qdqbert/configuration_qdqbert.py
src/transformers/models/qdqbert/configuration_qdqbert.py
+2
-0
src/transformers/models/rembert/configuration_rembert.py
src/transformers/models/rembert/configuration_rembert.py
+2
-0
src/transformers/models/roberta/configuration_roberta.py
src/transformers/models/roberta/configuration_roberta.py
+2
-0
src/transformers/models/roc_bert/configuration_roc_bert.py
src/transformers/models/roc_bert/configuration_roc_bert.py
+2
-0
src/transformers/models/roformer/configuration_roformer.py
src/transformers/models/roformer/configuration_roformer.py
+2
-0
src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
...nsformers/models/xlm_roberta/configuration_xlm_roberta.py
+2
-0
No files found.
src/transformers/models/bert/configuration_bert.py
View file @
4f78bcb2
...
@@ -114,6 +114,8 @@ class BertConfig(PretrainedConfig):
...
@@ -114,6 +114,8 @@ class BertConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/bert_generation/configuration_bert_generation.py
View file @
4f78bcb2
...
@@ -60,6 +60,8 @@ class BertGenerationConfig(PretrainedConfig):
...
@@ -60,6 +60,8 @@ class BertGenerationConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/big_bird/configuration_big_bird.py
View file @
4f78bcb2
...
@@ -70,6 +70,8 @@ class BigBirdConfig(PretrainedConfig):
...
@@ -70,6 +70,8 @@ class BigBirdConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/camembert/configuration_camembert.py
View file @
4f78bcb2
...
@@ -81,6 +81,8 @@ class CamembertConfig(PretrainedConfig):
...
@@ -81,6 +81,8 @@ class CamembertConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/data2vec/configuration_data2vec_text.py
View file @
4f78bcb2
...
@@ -73,6 +73,8 @@ class Data2VecTextConfig(PretrainedConfig):
...
@@ -73,6 +73,8 @@ class Data2VecTextConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/ernie/configuration_ernie.py
View file @
4f78bcb2
...
@@ -87,6 +87,8 @@ class ErnieConfig(PretrainedConfig):
...
@@ -87,6 +87,8 @@ class ErnieConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/esm/configuration_esm.py
View file @
4f78bcb2
...
@@ -79,6 +79,8 @@ class EsmConfig(PretrainedConfig):
...
@@ -79,6 +79,8 @@ class EsmConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/megatron_bert/configuration_megatron_bert.py
View file @
4f78bcb2
...
@@ -70,6 +70,8 @@ class MegatronBertConfig(PretrainedConfig):
...
@@ -70,6 +70,8 @@ class MegatronBertConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/nezha/configuration_nezha.py
View file @
4f78bcb2
...
@@ -48,6 +48,8 @@ class NezhaConfig(PretrainedConfig):
...
@@ -48,6 +48,8 @@ class NezhaConfig(PretrainedConfig):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
classifier_dropout (`float`, optional, defaults to 0.1):
classifier_dropout (`float`, optional, defaults to 0.1):
The dropout ratio for attached classifiers.
The dropout ratio for attached classifiers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
Example:
Example:
...
...
src/transformers/models/qdqbert/configuration_qdqbert.py
View file @
4f78bcb2
...
@@ -65,6 +65,8 @@ class QDQBertConfig(PretrainedConfig):
...
@@ -65,6 +65,8 @@ class QDQBertConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/rembert/configuration_rembert.py
View file @
4f78bcb2
...
@@ -76,6 +76,8 @@ class RemBertConfig(PretrainedConfig):
...
@@ -76,6 +76,8 @@ class RemBertConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/roberta/configuration_roberta.py
View file @
4f78bcb2
...
@@ -79,6 +79,8 @@ class RobertaConfig(PretrainedConfig):
...
@@ -79,6 +79,8 @@ class RobertaConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/roc_bert/configuration_roc_bert.py
View file @
4f78bcb2
...
@@ -64,6 +64,8 @@ class RoCBertConfig(PretrainedConfig):
...
@@ -64,6 +64,8 @@ class RoCBertConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/roformer/configuration_roformer.py
View file @
4f78bcb2
...
@@ -84,6 +84,8 @@ class RoFormerConfig(PretrainedConfig):
...
@@ -84,6 +84,8 @@ class RoFormerConfig(PretrainedConfig):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
The epsilon used by the layer normalization layers.
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
View file @
4f78bcb2
...
@@ -88,6 +88,8 @@ class XLMRobertaConfig(PretrainedConfig):
...
@@ -88,6 +88,8 @@ class XLMRobertaConfig(PretrainedConfig):
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
[Self-Attention with Relative Position Representations (Shaw et al.)](https://arxiv.org/abs/1803.02155).
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
For more information on `"relative_key_query"`, please refer to *Method 4* in [Improve Transformer Models
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
with Better Relative Position Embeddings (Huang et al.)](https://arxiv.org/abs/2009.13658).
is_decoder (`bool`, *optional*, defaults to `False`):
Whether the model is used as a decoder or not. If `False`, the model is used as an encoder.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment