Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
977ba7ab
Commit
977ba7ab
authored
Feb 16, 2021
by
Mark Daoust
Committed by
A. Unique TensorFlower
Feb 16, 2021
Browse files
Don't line wrap `code` spans.
PiperOrigin-RevId: 357794895
parent
a86917df
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
4 deletions
+4
-4
official/nlp/modeling/layers/gated_feedforward.py
official/nlp/modeling/layers/gated_feedforward.py
+4
-4
No files found.
official/nlp/modeling/layers/gated_feedforward.py
View file @
977ba7ab
...
@@ -33,8 +33,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
...
@@ -33,8 +33,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
intermediate_activation: Activation for the intermediate layer.
intermediate_activation: Activation for the intermediate layer.
dropout: Dropout probability for the output dropout.
dropout: Dropout probability for the output dropout.
use_gate: Whether to use gated linear units. If True, assuming `GELU` as the
use_gate: Whether to use gated linear units. If True, assuming `GELU` as the
activation and omitting bias, will apply
`GEGLU(x, W, V, W_2) = (GEGLU(xW)
activation and omitting bias, will apply
* xV)W2`; if False, will follow
`GEGLU(x, W, V, W_2) = (GEGLU(xW)
* xV)W2`; if False, will follow
"Attention Is All You Need" (https://arxiv.org/abs/1706.03762) paper and
"Attention Is All You Need" (https://arxiv.org/abs/1706.03762) paper and
apply `FFN(x, W, W_2) = GELU(xW_1)W_2.`
apply `FFN(x, W, W_2) = GELU(xW_1)W_2.`
num_blocks: The number of feedforward blocks to stack. Each block contains a
num_blocks: The number of feedforward blocks to stack. Each block contains a
...
@@ -43,8 +43,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
...
@@ -43,8 +43,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
dropout_position: Where to apply the dropout, the value can be either
dropout_position: Where to apply the dropout, the value can be either
`before_residual` or `after_residual`. If `before_residual`, will apply
`before_residual` or `after_residual`. If `before_residual`, will apply
`layer_output = layer_norm(dropout(layer_output) + layer_input)`; if
`layer_output = layer_norm(dropout(layer_output) + layer_input)`; if
`after residual`, will apply
`layer_output =
`after residual`, will apply
dropout(layer_norm(layer_output + layer_input))`.
`layer_output =
dropout(layer_norm(layer_output + layer_input))`.
kernel_initializer: Initializer for dense layer kernels.
kernel_initializer: Initializer for dense layer kernels.
bias_initializer: Initializer for dense layer biases.
bias_initializer: Initializer for dense layer biases.
kernel_regularizer: Regularizer for dense layer kernels.
kernel_regularizer: Regularizer for dense layer kernels.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment