Commit 088281ed authored by Hongkun Yu's avatar Hongkun Yu Committed by A. Unique TensorFlower
Browse files

Fix doc string.

PiperOrigin-RevId: 333314981
parent 47c77112
......@@ -390,7 +390,12 @@ class BigBirdMasks(tf.keras.layers.Layer):
@tf.keras.utils.register_keras_serializable(package="Text")
class BigBirdAttention(tf.keras.layers.MultiHeadAttention):
"""Attention layer with cache used for auto-agressive decoding.
"""BigBird, a sparse attention mechanism.
This layer follows the paper "Big Bird: Transformers for Longer Sequences"
(https://arxiv.org/abs/2007.14062).
It reduces this quadratic dependency of attention
computation to linear.
Arguments are the same as `MultiHeadAttention` layer.
"""
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment