Commit 977ba7ab authored by Mark Daoust's avatar Mark Daoust Committed by A. Unique TensorFlower
Browse files

Don't line wrap `code` spans.

PiperOrigin-RevId: 357794895
parent a86917df
...@@ -33,8 +33,8 @@ class GatedFeedforward(tf.keras.layers.Layer): ...@@ -33,8 +33,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
intermediate_activation: Activation for the intermediate layer. intermediate_activation: Activation for the intermediate layer.
dropout: Dropout probability for the output dropout. dropout: Dropout probability for the output dropout.
use_gate: Whether to use gated linear units. If True, assuming `GELU` as the use_gate: Whether to use gated linear units. If True, assuming `GELU` as the
activation and omitting bias, will apply `GEGLU(x, W, V, W_2) = (GEGLU(xW) activation and omitting bias, will apply
* xV)W2`; if False, will follow `GEGLU(x, W, V, W_2) = (GEGLU(xW) * xV)W2`; if False, will follow
"Attention Is All You Need" (https://arxiv.org/abs/1706.03762) paper and "Attention Is All You Need" (https://arxiv.org/abs/1706.03762) paper and
apply `FFN(x, W, W_2) = GELU(xW_1)W_2.` apply `FFN(x, W, W_2) = GELU(xW_1)W_2.`
num_blocks: The number of feedforward blocks to stack. Each block contains a num_blocks: The number of feedforward blocks to stack. Each block contains a
...@@ -43,8 +43,8 @@ class GatedFeedforward(tf.keras.layers.Layer): ...@@ -43,8 +43,8 @@ class GatedFeedforward(tf.keras.layers.Layer):
dropout_position: Where to apply the dropout, the value can be either dropout_position: Where to apply the dropout, the value can be either
`before_residual` or `after_residual`. If `before_residual`, will apply `before_residual` or `after_residual`. If `before_residual`, will apply
`layer_output = layer_norm(dropout(layer_output) + layer_input)`; if `layer_output = layer_norm(dropout(layer_output) + layer_input)`; if
`after residual`, will apply `layer_output = `after residual`, will apply
dropout(layer_norm(layer_output + layer_input))`. `layer_output = dropout(layer_norm(layer_output + layer_input))`.
kernel_initializer: Initializer for dense layer kernels. kernel_initializer: Initializer for dense layer kernels.
bias_initializer: Initializer for dense layer biases. bias_initializer: Initializer for dense layer biases.
kernel_regularizer: Regularizer for dense layer kernels. kernel_regularizer: Regularizer for dense layer kernels.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment