"...git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "f55873b783e2739124360d869955f9218817c211"
Commit f5e6e291 authored by A. Unique TensorFlower's avatar A. Unique TensorFlower
Browse files

Internal change

PiperOrigin-RevId: 289743636
parent 242ad38d
...@@ -43,10 +43,11 @@ class AlbertTransformerEncoder(network.Network): ...@@ -43,10 +43,11 @@ class AlbertTransformerEncoder(network.Network):
Attributes: Attributes:
vocab_size: The size of the token vocabulary. vocab_size: The size of the token vocabulary.
embedding_width: The width of the word embeddings. Embedding parameters will embedding_width: The width of the word embeddings. If the embedding width
be factorized into two matrices in the shape of ['vocab_size', is not equal to hidden size, embedding parameters will be factorized into
'embedding_width'] and ['embedding_width', 'hidden_size'] two matrices in the shape of ['vocab_size', 'embedding_width'] and
('embedding_width' is usually much smaller than 'hidden_size'). ['embedding_width', 'hidden_size'] ('embedding_width' is usually much
smaller than 'hidden_size').
hidden_size: The size of the transformer hidden layers. hidden_size: The size of the transformer hidden layers.
num_layers: The number of transformer layers. num_layers: The number of transformer layers.
num_attention_heads: The number of attention heads for each transformer. The num_attention_heads: The number of attention heads for each transformer. The
...@@ -149,10 +150,14 @@ class AlbertTransformerEncoder(network.Network): ...@@ -149,10 +150,14 @@ class AlbertTransformerEncoder(network.Network):
embeddings = ( embeddings = (
tf.keras.layers.Dropout(rate=dropout_rate, tf.keras.layers.Dropout(rate=dropout_rate,
dtype=tf.float32)(embeddings)) dtype=tf.float32)(embeddings))
# The width of final 'embedding' should be always 'hidden_size'. # We project the 'embedding' output to 'hidden_size' if it is not already
embeddings = layers.DenseEinsum( # 'hidden_size'.
output_shape=hidden_size, name='embedding_projection')( if embedding_width != hidden_size:
embeddings) embeddings = layers.DenseEinsum(
output_shape=hidden_size,
kernel_initializer=initializer,
name='embedding_projection')(
embeddings)
if float_dtype == 'float16': if float_dtype == 'float16':
embeddings = tf.cast(embeddings, tf.float16) embeddings = tf.cast(embeddings, tf.float16)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment