*[OnDeviceEmbedding](on_device_embedding.py) implements efficient embedding lookups designed for TPU-based models.
*[PositionalEmbedding](position_embedding.py) creates a positional embedding
as described in ["BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding"](https://arxiv.org/abs/1810.04805).
*[PositionalEmbedding](position_embedding.py) creates a positional embedding
as described in ["BERT: Pre-training
of Deep Bidirectional Transformers for Language Understanding"]
(https://arxiv.org/abs/1810.04805).
*[SelfAttentionMask](self_attention_mask.py) creates a 3D attention mask from
a 2D tensor mask.
*[SelfAttentionMask](self_attention_mask.py) creates a 3D attention mask from a 2D tensor mask.
*[MaskedSoftmax](masked_softmax.py) implements a softmax with an optional
masking input. If no mask is provided to this layer, it performs a standard
softmax; however, if a mask tensor is applied (which should be 1 in
positions where the data should be allowed through, and 0 where the data
should be masked), the output will have masked positions set to
approximately zero.
*[MaskedSoftmax](masked_softmax.py) implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
*[ClassificationHead](cls_head.py) A pooling head over a sequence of
embeddings, commonly used by classification tasks.