README.md 2.46 KB
Newer Older
1
# Layers
Hongkun Yu's avatar
Hongkun Yu committed
2

3
4
5
Layers are the fundamental building blocks for NLP models. They can be used to
assemble new layers, networks, or models.

Hongkun Yu's avatar
Hongkun Yu committed
6
7
8
9
10
11
12
13
14
*   [DenseEinsum](dense_einsum.py) implements a feedforward network using
    tf.einsum. This layer contains the einsum op, the associated weight, and the
    logic required to generate the einsum expression for the given
    initialization parameters.

*   [MultiHeadAttention](attention.py) implements an optionally masked attention
    between two tensors, from_tensor and to_tensor, as described in
    ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762). If
    `from_tensor` and `to_tensor` are the same, then this is self-attention.
15

Hongkun Yu's avatar
Hongkun Yu committed
16
17
*   [CachedAttention](attention.py) implements an attention layer with cache
    used for auto-agressive decoding.
18

Hongkun Yu's avatar
Hongkun Yu committed
19
20
21
*   [TalkingHeadsAttention](talking_heads_attention.py) implements the talking
    heads attention, as decribed in
    ["Talking-Heads Attention"](https://arxiv.org/abs/2003.02436).
22

Hongkun Yu's avatar
Hongkun Yu committed
23
24
25
*   [Transformer](transformer.py) implements an optionally masked transformer as
    described in
    ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762).
26

Hongkun Yu's avatar
Hongkun Yu committed
27
28
29
*   [ReZeroTransformer](rezero_transformer.py) implements Transformer with
    ReZero described in
    ["ReZero is All You Need: Fast Convergence at Large Depth"](https://arxiv.org/abs/2003.04887).
30

Hongkun Yu's avatar
Hongkun Yu committed
31
32
*   [OnDeviceEmbedding](on_device_embedding.py) implements efficient embedding
    lookups designed for TPU-based models.
Le Hou's avatar
Le Hou committed
33

Hongkun Yu's avatar
Hongkun Yu committed
34
35
36
*   [PositionalEmbedding](position_embedding.py) creates a positional embedding
    as described in ["BERT: Pre-training of Deep Bidirectional Transformers for
    Language Understanding"](https://arxiv.org/abs/1810.04805).
37

Hongkun Yu's avatar
Hongkun Yu committed
38
39
*   [SelfAttentionMask](self_attention_mask.py) creates a 3D attention mask from
    a 2D tensor mask.
40

Hongkun Yu's avatar
Hongkun Yu committed
41
42
43
44
45
46
*   [MaskedSoftmax](masked_softmax.py) implements a softmax with an optional
    masking input. If no mask is provided to this layer, it performs a standard
    softmax; however, if a mask tensor is applied (which should be 1 in
    positions where the data should be allowed through, and 0 where the data
    should be masked), the output will have masked positions set to
    approximately zero.
47

Hongkun Yu's avatar
Hongkun Yu committed
48
49
*   [ClassificationHead](cls_head.py) A pooling head over a sequence of
    embeddings, commonly used by classification tasks.
Chen Chen's avatar
Chen Chen committed
50
51
52
53

*   [GatedFeedforward](gated_feedforward.py) implements the gated linear layer
    feedforward as described in
    ["GLU Variants Improve Transformer"](https://arxiv.org/abs/2002.05202).