Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
7a45b513
"...text-generation-inference.git" did not exist on "718096f6952147681559ba0a3962040f8655af1f"
Unverified
Commit
7a45b513
authored
Oct 25, 2021
by
Vishnu Banna
Committed by
GitHub
Oct 25, 2021
Browse files
Merge branch 'tensorflow:master' into exp_pr2
parents
54115e16
12bbefce
Changes
111
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
2128 additions
and
1123 deletions
+2128
-1123
official/nlp/keras_nlp/README.md
official/nlp/keras_nlp/README.md
+0
-37
official/nlp/keras_nlp/__init__.py
official/nlp/keras_nlp/__init__.py
+0
-18
official/nlp/keras_nlp/contributing.md
official/nlp/keras_nlp/contributing.md
+0
-21
official/nlp/keras_nlp/encoders/bert_encoder_test.py
official/nlp/keras_nlp/encoders/bert_encoder_test.py
+0
-227
official/nlp/keras_nlp/layers/__init__.py
official/nlp/keras_nlp/layers/__init__.py
+0
-20
official/nlp/keras_nlp/layers/on_device_embedding_test.py
official/nlp/keras_nlp/layers/on_device_embedding_test.py
+0
-213
official/nlp/keras_nlp/layers/position_embedding_test.py
official/nlp/keras_nlp/layers/position_embedding_test.py
+0
-132
official/nlp/keras_nlp/layers/transformer_encoder_block.py
official/nlp/keras_nlp/layers/transformer_encoder_block.py
+0
-20
official/nlp/keras_nlp/layers/transformer_encoder_block_test.py
...al/nlp/keras_nlp/layers/transformer_encoder_block_test.py
+0
-324
official/nlp/keras_nlp/requirements.txt
official/nlp/keras_nlp/requirements.txt
+0
-1
official/nlp/keras_nlp/setup.py
official/nlp/keras_nlp/setup.py
+0
-69
official/nlp/modeling/models/README.md
official/nlp/modeling/models/README.md
+9
-0
official/nlp/modeling/models/__init__.py
official/nlp/modeling/models/__init__.py
+2
-0
official/nlp/modeling/models/t5.py
official/nlp/modeling/models/t5.py
+1430
-0
official/nlp/modeling/models/t5_test.py
official/nlp/modeling/models/t5_test.py
+505
-0
official/nlp/modeling/networks/funnel_transformer.py
official/nlp/modeling/networks/funnel_transformer.py
+168
-39
official/nlp/modeling/networks/funnel_transformer_test.py
official/nlp/modeling/networks/funnel_transformer_test.py
+11
-2
official/nlp/projects/teams/experiments/base/glue_mnli.yaml
official/nlp/projects/teams/experiments/base/glue_mnli.yaml
+1
-0
official/nlp/projects/teams/experiments/base/squad_v1.yaml
official/nlp/projects/teams/experiments/base/squad_v1.yaml
+1
-0
official/nlp/projects/teams/experiments/base/squad_v2.yaml
official/nlp/projects/teams/experiments/base/squad_v2.yaml
+1
-0
No files found.
official/nlp/keras_nlp/README.md
deleted
100644 → 0
View file @
54115e16
# keras-nlp
## Layers
Layers are the fundamental building blocks for NLP models. They can be used to
assemble new layers, networks, or models.
*
[
TransformerEncoderBlock
](
layers/transformer_encoder_block.py
)
implements
an optionally masked transformer as described in
[
"Attention Is All You Need"
](
https://arxiv.org/abs/1706.03762
)
.
*
[
OnDeviceEmbedding
](
layers/on_device_embedding.py
)
implements efficient
embedding lookups designed for TPU-based models.
*
[
PositionalEmbedding
](
layers/position_embedding.py
)
creates a positional
embedding as described in
[
"BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding"
](
https://arxiv.org/abs/1810.04805
)
.
*
[
SelfAttentionMask
](
layers/self_attention_mask.py
)
creates a 3D attention
mask from a 2D tensor mask.
*
[
MaskedLM
](
layers/masked_lm.py
)
implements a masked language model. It
assumes the embedding table variable is passed to it.
## Encoders
Encoders are combinations of layers (and possibly other encoders). They are
sub-units of models that would not be trained alone. It encapsulates common
network structures like a classification head or a transformer encoder into an
easily handled object with a standardized configuration.
*
[
BertEncoder
](
encoders/bert_encoder.py
)
implements a bi-directional
Transformer-based encoder as described in
[
"BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding"
](
https://arxiv.org/abs/1810.04805
)
. It includes the embedding
lookups, transformer layers and pooling layer.
official/nlp/keras_nlp/__init__.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Keras-NLP package definition."""
# pylint: disable=wildcard-import
from
official.nlp.keras_nlp
import
encoders
from
official.nlp.keras_nlp
import
layers
official/nlp/keras_nlp/contributing.md
deleted
100644 → 0
View file @
54115e16
## Contributing to KerasNLP
Patches to KerasNLP are welcome!
The source-of-truth repository lives under
[
TF Model Garden NLP
](
https://github.com/tensorflow/models/tree/master/official/nlp/keras_nlp
)
,
and is mirrored as a read-only repository under
[
keras-team/keras-nlp
](
https://github.com/keras-team/keras-nlp
)
.
Contributions should be made as PRs to the TF Model Garden repository.
This is to ensure the codebase is rigorously tested with state-of-art models
on different accelerators.
In the long run, we will move development to the current repository
`keras-team/keras-nlp`
.
## :heavy_check_mark: Contributor checklist
1.
Ensure you have signed the
[
Contributor License Agreement
](
https://cla.developers.google.com/about/google-individual?csw=1
)
.
*
All code contributors are required to sign a Contributor License Agreement.
*
Please read this
[
troubleshooting guide
](
Contributor-License-Agreements#troubleshooting-clas
)
if you encounter an issue.
2.
Please review the
[
contribution guidelines
](
https://github.com/tensorflow/models/wiki/How-to-contribute
)
.
3.
Check if your changes are consistent with the
[
TensorFlow coding style
](
https://www.tensorflow.org/community/contribute/code_style
)
.
official/nlp/keras_nlp/encoders/bert_encoder_test.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for transformer-based bert encoder network."""
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.python.keras
import
keras_parameterized
# pylint: disable=g-direct-tensorflow-import
from
official.nlp.keras_nlp.encoders
import
bert_encoder
# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It
# guarantees forward compatibility of this code for the V2 switchover.
@
keras_parameterized
.
run_all_keras_modes
class
BertEncoderTest
(
keras_parameterized
.
TestCase
):
def
tearDown
(
self
):
super
(
BertEncoderTest
,
self
).
tearDown
()
tf
.
keras
.
mixed_precision
.
set_global_policy
(
"float32"
)
def
test_network_creation
(
self
):
hidden_size
=
32
sequence_length
=
21
# Create a small BertEncoder for testing.
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
100
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
3
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
self
.
assertIsInstance
(
test_network
.
transformer_layers
,
list
)
self
.
assertLen
(
test_network
.
transformer_layers
,
3
)
self
.
assertIsInstance
(
test_network
.
pooler_layer
,
tf
.
keras
.
layers
.
Dense
)
expected_data_shape
=
[
None
,
sequence_length
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_pooled_shape
,
pooled
.
shape
.
as_list
())
# The default output dtype is float32.
self
.
assertAllEqual
(
tf
.
float32
,
data
.
dtype
)
self
.
assertAllEqual
(
tf
.
float32
,
pooled
.
dtype
)
def
test_all_encoder_outputs_network_creation
(
self
):
hidden_size
=
32
sequence_length
=
21
# Create a small BertEncoder for testing.
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
100
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
3
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
all_encoder_outputs
=
dict_outputs
[
"encoder_outputs"
]
pooled
=
dict_outputs
[
"pooled_output"
]
expected_data_shape
=
[
None
,
sequence_length
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertLen
(
all_encoder_outputs
,
3
)
for
data
in
all_encoder_outputs
:
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_pooled_shape
,
pooled
.
shape
.
as_list
())
# The default output dtype is float32.
self
.
assertAllEqual
(
tf
.
float32
,
all_encoder_outputs
[
-
1
].
dtype
)
self
.
assertAllEqual
(
tf
.
float32
,
pooled
.
dtype
)
def
test_network_creation_with_float16_dtype
(
self
):
hidden_size
=
32
sequence_length
=
21
tf
.
keras
.
mixed_precision
.
set_global_policy
(
"mixed_float16"
)
# Create a small BertEncoder for testing.
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
100
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
3
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
expected_data_shape
=
[
None
,
sequence_length
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_pooled_shape
,
pooled
.
shape
.
as_list
())
# If float_dtype is set to float16, the data output is float32 (from a layer
# norm) and pool output should be float16.
self
.
assertAllEqual
(
tf
.
float32
,
data
.
dtype
)
self
.
assertAllEqual
(
tf
.
float16
,
pooled
.
dtype
)
@
parameterized
.
named_parameters
(
(
"all_sequence"
,
None
,
21
),
(
"output_range"
,
1
,
1
),
)
def
test_network_invocation
(
self
,
output_range
,
out_seq_len
):
hidden_size
=
32
sequence_length
=
21
vocab_size
=
57
num_types
=
7
# Create a small BertEncoder for testing.
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
,
output_range
=
output_range
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
# Create a model based off of this network:
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
# Invoke the model. We can't validate the output data here (the model is too
# complex) but this will catch structural runtime errors.
batch_size
=
3
word_id_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
))
type_id_data
=
np
.
random
.
randint
(
num_types
,
size
=
(
batch_size
,
sequence_length
))
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
1
],
out_seq_len
)
# Creates a BertEncoder with max_sequence_length != sequence_length
max_sequence_length
=
128
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
max_sequence_length
=
max_sequence_length
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
1
],
sequence_length
)
# Creates a BertEncoder with embedding_width != hidden_size
test_network
=
bert_encoder
.
BertEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
max_sequence_length
=
max_sequence_length
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
,
embedding_width
=
16
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
-
1
],
hidden_size
)
self
.
assertTrue
(
hasattr
(
test_network
,
"_embedding_projection"
))
def
test_serialize_deserialize
(
self
):
# Create a network object that sets all of its config options.
kwargs
=
dict
(
vocab_size
=
100
,
hidden_size
=
32
,
num_layers
=
3
,
num_attention_heads
=
2
,
max_sequence_length
=
21
,
type_vocab_size
=
12
,
inner_dim
=
1223
,
inner_activation
=
"relu"
,
output_dropout
=
0.05
,
attention_dropout
=
0.22
,
initializer
=
"glorot_uniform"
,
output_range
=-
1
,
embedding_width
=
16
,
embedding_layer
=
None
,
norm_first
=
False
)
network
=
bert_encoder
.
BertEncoder
(
**
kwargs
)
expected_config
=
dict
(
kwargs
)
expected_config
[
"inner_activation"
]
=
tf
.
keras
.
activations
.
serialize
(
tf
.
keras
.
activations
.
get
(
expected_config
[
"inner_activation"
]))
expected_config
[
"initializer"
]
=
tf
.
keras
.
initializers
.
serialize
(
tf
.
keras
.
initializers
.
get
(
expected_config
[
"initializer"
]))
# Validate that the config can be forced to JSON.
_
=
network
.
to_json
()
# Tests model saving/loading.
model_path
=
self
.
get_temp_dir
()
+
"/model"
network
.
save
(
model_path
)
_
=
tf
.
keras
.
models
.
load_model
(
model_path
)
if
__name__
==
"__main__"
:
tf
.
test
.
main
()
official/nlp/keras_nlp/layers/__init__.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Keras-NLP layers package definition."""
from
official.nlp.keras_nlp.layers.masked_lm
import
MaskedLM
from
official.nlp.keras_nlp.layers.on_device_embedding
import
OnDeviceEmbedding
from
official.nlp.keras_nlp.layers.position_embedding
import
PositionEmbedding
from
official.nlp.keras_nlp.layers.self_attention_mask
import
SelfAttentionMask
from
official.nlp.keras_nlp.layers.transformer_encoder_block
import
TransformerEncoderBlock
official/nlp/keras_nlp/layers/on_device_embedding_test.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for Keras-based one-hot embedding layer."""
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.python.keras
import
keras_parameterized
# pylint: disable=g-direct-tensorflow-import
from
official.nlp.keras_nlp.layers
import
on_device_embedding
# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It
# guarantees forward compatibility of this code for the V2 switchover.
@
keras_parameterized
.
run_all_keras_modes
class
OnDeviceEmbeddingTest
(
keras_parameterized
.
TestCase
):
def
test_layer_creation
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# The output should be the same as the input, save that it has an extra
# embedding_width dimension on the end.
expected_output_shape
=
[
None
,
sequence_length
,
embedding_width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
self
.
assertEqual
(
output_tensor
.
dtype
,
tf
.
float32
)
def
test_layer_creation_with_mixed_precision
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
dtype
=
"mixed_float16"
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# The output should be the same as the input, save that it has an extra
# embedding_width dimension on the end.
expected_output_shape
=
[
None
,
sequence_length
,
embedding_width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
self
.
assertEqual
(
output_tensor
.
dtype
,
tf
.
float16
)
def
test_layer_invocation
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
3
input_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
output
=
model
.
predict
(
input_data
)
self
.
assertEqual
(
tf
.
float32
,
output
.
dtype
)
def
test_layer_invocation_with_mixed_precision
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
dtype
=
"mixed_float16"
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
3
input_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
output
=
model
.
predict
(
input_data
)
self
.
assertEqual
(
tf
.
float16
,
output
.
dtype
)
def
test_one_hot_layer_creation
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
use_one_hot
=
True
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# The output should be the same as the input, save that it has an extra
# embedding_width dimension on the end.
expected_output_shape
=
[
None
,
sequence_length
,
embedding_width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
self
.
assertEqual
(
output_tensor
.
dtype
,
tf
.
float32
)
def
test_one_hot_layer_creation_with_mixed_precision
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
dtype
=
"mixed_float16"
,
use_one_hot
=
True
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# The output should be the same as the input, save that it has an extra
# embedding_width dimension on the end.
expected_output_shape
=
[
None
,
sequence_length
,
embedding_width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
self
.
assertEqual
(
output_tensor
.
dtype
,
tf
.
float16
)
def
test_one_hot_layer_invocation
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
use_one_hot
=
True
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
3
input_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
output
=
model
.
predict
(
input_data
)
self
.
assertEqual
(
tf
.
float32
,
output
.
dtype
)
def
test_one_hot_layer_invocation_with_mixed_precision
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
dtype
=
"mixed_float16"
,
use_one_hot
=
True
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
3
input_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
output
=
model
.
predict
(
input_data
)
self
.
assertEqual
(
tf
.
float16
,
output
.
dtype
)
def
test_use_scale_layer_invocation
(
self
):
vocab_size
=
31
embedding_width
=
27
test_layer
=
on_device_embedding
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
scale_factor
=
embedding_width
**
0.5
)
# Create a 2-dimensional input (the first dimension is implicit).
sequence_length
=
23
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
),
dtype
=
tf
.
int32
)
output_tensor
=
test_layer
(
input_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
3
input_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
output
=
model
.
predict
(
input_data
)
self
.
assertEqual
(
tf
.
float32
,
output
.
dtype
)
if
__name__
==
"__main__"
:
tf
.
test
.
main
()
official/nlp/keras_nlp/layers/position_embedding_test.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for Keras-based positional embedding layer."""
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.python.keras
import
keras_parameterized
# pylint: disable=g-direct-tensorflow-import
from
official.nlp.keras_nlp.layers
import
position_embedding
# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It
# guarantees forward compatibility of this code for the V2 switchover.
@
keras_parameterized
.
run_all_keras_modes
class
PositionEmbeddingLayerTest
(
keras_parameterized
.
TestCase
):
def
test_static_layer_output_shape
(
self
):
# Create a 3-dimensional input (the first dimension is implicit).
sequence_length
=
21
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
sequence_length
)
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
# When using static positional embedding shapes, the output is expected
# to be the same as the input shape in all dimensions save batch.
expected_output_shape
=
[
None
,
sequence_length
,
width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
# The default output dtype for this layer should be tf.float32.
self
.
assertEqual
(
tf
.
float32
,
output_tensor
.
dtype
)
def
test_non_default_axis_static
(
self
):
# Create a 3-dimensional input (the first dimension is implicit).
sequence_length
=
21
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
sequence_length
,
seq_axis
=
2
)
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
width
,
sequence_length
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
# When using static positional embedding shapes, the output is expected
# to be the same as the input shape in all dimensions save batch.
expected_output_shape
=
[
None
,
width
,
sequence_length
,
width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
# The default output dtype for this layer should be tf.float32.
self
.
assertEqual
(
tf
.
float32
,
output_tensor
.
dtype
)
def
test_float16_dtype
(
self
):
# Create a 3-dimensional input (the first dimension is implicit).
sequence_length
=
21
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
sequence_length
,
dtype
=
"float16"
)
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
# When using static positional embedding shapes, the output is expected
# to be the same as the input shape in all dimensions save batch.
expected_output_shape
=
[
None
,
sequence_length
,
width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
# The default output dtype for this layer should be tf.float32.
self
.
assertEqual
(
tf
.
float16
,
output_tensor
.
dtype
)
def
test_dynamic_layer_output_shape
(
self
):
max_sequence_length
=
40
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
max_sequence_length
)
# Create a 3-dimensional input (the first dimension is implicit).
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
None
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
# When using dynamic positional embedding shapes, the output is expected
# to be the same as the input shape in all dimensions - but may be None if
# the input shape is None there.
expected_output_shape
=
[
None
,
None
,
width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
def
test_non_default_axis_dynamic
(
self
):
max_sequence_length
=
60
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
max_sequence_length
,
seq_axis
=
2
)
# Create a 3-dimensional input (the first dimension is implicit).
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
None
,
None
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
# When using dynamic positional embedding shapes, the output is expected
# to be the same as the input shape in all dimensions - but may be None if
# the input shape is None there.
expected_output_shape
=
[
None
,
None
,
None
,
width
]
self
.
assertEqual
(
expected_output_shape
,
output_tensor
.
shape
.
as_list
())
def
test_dynamic_layer_slicing
(
self
):
max_sequence_length
=
40
test_layer
=
position_embedding
.
PositionEmbedding
(
max_length
=
max_sequence_length
)
# Create a 3-dimensional input (the first dimension is implicit).
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
None
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
# Create input data that is shorter than max_sequence_length, which should
# trigger a down-slice.
input_length
=
17
# Note: This test explicitly uses a batch size of 1. This is to get around
# Keras' restriction on Model invocations: inputs are expected to have the
# same batch cardinality as outputs. In practice, this layer should be used
# inside a model, where it can be projected when added to another tensor.
input_data
=
np
.
ones
((
1
,
input_length
,
width
))
output_data
=
model
.
predict
(
input_data
)
self
.
assertAllEqual
([
1
,
input_length
,
width
],
output_data
.
shape
)
if
__name__
==
"__main__"
:
tf
.
test
.
main
()
official/nlp/keras_nlp/layers/transformer_encoder_block.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Keras-based TransformerEncoder block layer."""
from
official.nlp.modeling
import
layers
TransformerEncoderBlock
=
layers
.
TransformerEncoderBlock
official/nlp/keras_nlp/layers/transformer_encoder_block_test.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for Keras-based transformer block layer."""
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.python.keras
import
keras_parameterized
# pylint: disable=g-direct-tensorflow-import
from
official.nlp.keras_nlp.layers.transformer_encoder_block
import
TransformerEncoderBlock
@
keras_parameterized
.
run_all_keras_modes
@
parameterized
.
named_parameters
(
(
'base'
,
TransformerEncoderBlock
))
class
TransformerEncoderBlockLayerTest
(
keras_parameterized
.
TestCase
):
def
tearDown
(
self
):
super
(
TransformerEncoderBlockLayerTest
,
self
).
tearDown
()
tf
.
keras
.
mixed_precision
.
set_global_policy
(
'float32'
)
def
test_layer_creation
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
output_tensor
=
test_layer
(
data_tensor
)
# The default output of a transformer layer should be the same as the input.
self
.
assertEqual
(
data_tensor
.
shape
.
as_list
(),
output_tensor
.
shape
.
as_list
())
def
test_layer_creation_with_mask
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
# Create a 2-dimensional input (the first dimension is implicit).
mask_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
sequence_length
))
output_tensor
=
test_layer
([
data_tensor
,
mask_tensor
])
# The default output of a transformer layer should be the same as the input.
self
.
assertEqual
(
data_tensor
.
shape
.
as_list
(),
output_tensor
.
shape
.
as_list
())
def
test_layer_invocation
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
output_tensor
=
test_layer
(
data_tensor
)
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
(
data_tensor
,
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
6
input_data
=
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
))
_
=
model
.
predict
(
input_data
)
def
test_layer_invocation_with_mask
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
# Create a 2-dimensional input (the first dimension is implicit).
mask_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
sequence_length
))
output_tensor
=
test_layer
([
data_tensor
,
mask_tensor
])
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
([
data_tensor
,
mask_tensor
],
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
6
input_data
=
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
))
# The attention mask should be of shape (batch, from_seq_len, to_seq_len),
# which here is (batch, sequence_length, sequence_length)
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
,
sequence_length
))
_
=
model
.
predict
([
input_data
,
mask_data
])
def
test_layer_output_range
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
batch_size
=
6
input_data
=
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
,
sequence_length
))
output_tensor
=
test_layer
([
input_data
,
mask_data
])
# The layer only attends to the first token and outputs the first token
# embedding.
new_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
output_range
=
1
)
_
=
new_layer
([
input_data
,
mask_data
])
new_layer
.
set_weights
(
test_layer
.
get_weights
())
new_output_tensor
=
new_layer
([
input_data
,
mask_data
])
self
.
assertAllClose
(
new_output_tensor
,
output_tensor
[:,
0
:
1
,
:],
atol
=
5e-5
,
rtol
=
0.003
)
def
test_layer_output_range_without_mask
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
norm_first
=
True
)
sequence_length
=
21
width
=
80
batch_size
=
6
input_data
=
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
))
output_tensor
=
test_layer
(
input_data
)
# The layer only attends to the first token and outputs the first token
# embedding.
new_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
output_range
=
1
,
norm_first
=
True
)
_
=
new_layer
(
input_data
)
new_layer
.
set_weights
(
test_layer
.
get_weights
())
new_output_tensor
=
new_layer
(
input_data
)
self
.
assertAllClose
(
new_output_tensor
,
output_tensor
[:,
0
:
1
,
:],
atol
=
5e-5
,
rtol
=
0.003
)
def
test_layer_output_range_with_pre_norm
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
norm_first
=
True
)
sequence_length
=
21
width
=
80
batch_size
=
6
input_data
=
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
,
sequence_length
))
output_tensor
=
test_layer
([
input_data
,
mask_data
])
# The layer only attends to the first token and outputs the first token
# embedding.
new_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
output_range
=
1
,
norm_first
=
True
)
_
=
new_layer
([
input_data
,
mask_data
])
new_layer
.
set_weights
(
test_layer
.
get_weights
())
new_output_tensor
=
new_layer
([
input_data
,
mask_data
])
self
.
assertAllClose
(
new_output_tensor
,
output_tensor
[:,
0
:
1
,
:],
atol
=
5e-5
,
rtol
=
0.003
)
def
test_layer_invocation_with_float16_dtype
(
self
,
transformer_cls
):
tf
.
keras
.
mixed_precision
.
set_global_policy
(
'mixed_float16'
)
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
)
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
# Create a 2-dimensional input (the first dimension is implicit).
mask_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
sequence_length
))
output_tensor
=
test_layer
([
data_tensor
,
mask_tensor
])
# Create a model from the test layer.
model
=
tf
.
keras
.
Model
([
data_tensor
,
mask_tensor
],
output_tensor
)
# Invoke the model on test data. We can't validate the output data itself
# (the NN is too complex) but this will rule out structural runtime errors.
batch_size
=
6
input_data
=
(
10
*
np
.
random
.
random_sample
(
(
batch_size
,
sequence_length
,
width
)))
# The attention mask should be of shape (batch, from_seq_len, to_seq_len),
# which here is (batch, sequence_length, sequence_length)
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
,
sequence_length
))
_
=
model
.
predict
([
input_data
,
mask_data
])
def
test_transform_with_initializer
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
kernel_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
))
sequence_length
=
21
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,
width
))
output
=
test_layer
(
data_tensor
)
# The default output of a transformer layer should be the same as the input.
self
.
assertEqual
(
data_tensor
.
shape
.
as_list
(),
output
.
shape
.
as_list
())
def
test_dynamic_layer_sequence
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
10
,
inner_dim
=
2048
,
inner_activation
=
'relu'
,
kernel_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
))
# Create a 3-dimensional input (the first dimension is implicit).
width
=
30
input_tensor
=
tf
.
keras
.
Input
(
shape
=
(
None
,
width
))
output_tensor
=
test_layer
(
input_tensor
)
model
=
tf
.
keras
.
Model
(
input_tensor
,
output_tensor
)
input_length
=
17
input_data
=
np
.
ones
((
1
,
input_length
,
width
))
output_data
=
model
.
predict
(
input_data
)
self
.
assertAllEqual
([
1
,
input_length
,
width
],
output_data
.
shape
)
def
test_separate_qkv
(
self
,
transformer_cls
):
test_layer
=
transformer_cls
(
num_attention_heads
=
2
,
inner_dim
=
128
,
inner_activation
=
'relu'
,
kernel_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
))
# Forward path.
q_tensor
=
tf
.
zeros
([
2
,
4
,
16
],
dtype
=
tf
.
float32
)
kv_tensor
=
tf
.
zeros
([
2
,
8
,
16
],
dtype
=
tf
.
float32
)
dummy_mask
=
tf
.
zeros
([
2
,
4
,
8
],
dtype
=
tf
.
float32
)
inputs
=
[
q_tensor
,
kv_tensor
,
dummy_mask
]
output
=
test_layer
(
inputs
)
self
.
assertEqual
(
output
.
shape
,
q_tensor
.
shape
)
@
keras_parameterized
.
run_all_keras_modes
class
TransformerArgumentTest
(
keras_parameterized
.
TestCase
):
def
test_use_bias_norm_first
(
self
):
num_attention_heads
=
2
hidden_size
=
16
encoder_block
=
TransformerEncoderBlock
(
num_attention_heads
=
num_attention_heads
,
inner_dim
=
32
,
inner_activation
=
'relu'
,
output_dropout
=
0.1
,
attention_dropout
=
0.1
,
use_bias
=
False
,
norm_first
=
True
,
norm_epsilon
=
1e-6
,
inner_dropout
=
0.1
,
attention_initializer
=
tf
.
keras
.
initializers
.
RandomUniform
(
minval
=
0.
,
maxval
=
1.
))
# Forward path.
dummy_tensor
=
tf
.
zeros
([
2
,
4
,
16
],
dtype
=
tf
.
float32
)
dummy_mask
=
tf
.
zeros
([
2
,
4
,
4
],
dtype
=
tf
.
float32
)
inputs
=
[
dummy_tensor
,
dummy_mask
]
output
=
encoder_block
(
inputs
)
self
.
assertEqual
(
output
.
shape
,
(
2
,
4
,
hidden_size
))
def
test_get_config
(
self
):
num_attention_heads
=
2
encoder_block
=
TransformerEncoderBlock
(
num_attention_heads
=
num_attention_heads
,
inner_dim
=
32
,
inner_activation
=
'relu'
,
output_dropout
=
0.1
,
attention_dropout
=
0.1
,
use_bias
=
False
,
norm_first
=
True
,
norm_epsilon
=
1e-6
,
inner_dropout
=
0.1
,
attention_initializer
=
tf
.
keras
.
initializers
.
RandomUniform
(
minval
=
0.
,
maxval
=
1.
))
encoder_block_config
=
encoder_block
.
get_config
()
new_encoder_block
=
TransformerEncoderBlock
.
from_config
(
encoder_block_config
)
self
.
assertEqual
(
encoder_block_config
,
new_encoder_block
.
get_config
())
@
parameterized
.
parameters
({
'attention_axes'
:
None
},
{
'attention_axes'
:
[
1
]},
{
'attention_axes'
:
[
2
]},
{
'attention_axes'
:
[
1
,
2
]})
def
test_several_attention_axes
(
self
,
attention_axes
):
test_layer
=
TransformerEncoderBlock
(
inner_dim
=
32
,
inner_activation
=
'relu'
,
output_dropout
=
0.1
,
attention_dropout
=
0.1
,
use_bias
=
False
,
norm_first
=
True
,
norm_epsilon
=
1e-6
,
inner_dropout
=
0.1
,
num_attention_heads
=
10
,
attention_axes
=
attention_axes
)
num_rows
=
21
num_cols
=
13
width
=
80
# Create a 3-dimensional input (the first dimension is implicit).
data_tensor
=
tf
.
keras
.
Input
(
shape
=
(
num_rows
,
num_cols
,
width
))
output_tensor
=
test_layer
(
data_tensor
)
# The default output of a transformer layer should be the same as the input.
self
.
assertEqual
(
data_tensor
.
shape
.
as_list
(),
output_tensor
.
shape
.
as_list
())
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
official/nlp/keras_nlp/requirements.txt
deleted
100644 → 0
View file @
54115e16
numpy>=1.15.4
official/nlp/keras_nlp/setup.py
deleted
100644 → 0
View file @
54115e16
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Setup script."""
import
os
from
setuptools
import
find_packages
from
setuptools
import
setup
version
=
'0.0.1'
def
_get_requirements
():
"""Parses requirements.txt file."""
install_requires_tmp
=
[]
dependency_links_tmp
=
[]
with
open
(
os
.
path
.
join
(
os
.
path
.
dirname
(
__file__
),
'./requirements.txt'
),
'r'
)
as
f
:
for
line
in
f
:
package_name
=
line
.
strip
()
# Skip empty line or comments starting with "#".
if
not
package_name
or
package_name
[
0
]
==
'#'
:
continue
if
package_name
.
startswith
(
'-e '
):
dependency_links_tmp
.
append
(
package_name
[
3
:].
strip
())
else
:
install_requires_tmp
.
append
(
package_name
)
return
install_requires_tmp
,
dependency_links_tmp
install_requires
,
dependency_links
=
_get_requirements
()
install_requires
.
append
(
'tf-nightly'
)
setup
(
name
=
'keras-nlp'
,
version
=
version
,
description
=
'Keras Natural Language Processing Library'
,
url
=
'https://github.com/keras-team/keras-nlp'
,
author
=
'The Keras authors'
,
author_email
=
'keras-team@google.com'
,
license
=
'Apache License 2.0'
,
install_requires
=
install_requires
,
classifiers
=
[
'Programming Language :: Python'
,
'Programming Language :: Python :: 3.6'
,
'Operating System :: Unix'
,
'Operating System :: Microsoft :: Windows'
,
'Operating System :: MacOS'
,
'Intended Audience :: Science/Research'
,
'Topic :: Scientific/Engineering'
,
'Topic :: Software Development'
],
packages
=
find_packages
(
exclude
=
(
'tests'
,)),
exclude_package_data
=
{
''
:
[
'*_test.py'
,],},
dependency_links
=
dependency_links
,
python_requires
=
'>=3.6'
,
)
official/nlp/modeling/models/README.md
View file @
7a45b513
...
...
@@ -23,3 +23,12 @@ respectively.
*
[
`DualEncoder`
](
dual_encoder.py
)
implements a dual encoder model, suitbale for
retrieval tasks.
*
[
`Seq2SeqTransformer`
](
seq2seq_transformer.py
)
implements the original
Transformer model for seq-to-seq tasks.
*
[
`T5Transformer`
](
t5.py
)
implements a standalone T5 model for seq-to-seq
tasks. The models are compatible with released T5 architecture and converted
checkpoints. The modules are implemented as
`tf.Module`
. To use with Keras,
users can wrap them within Keras customized layers, i.e. we can define the
modules inside the
`__init__`
of Keras layer and call the modules in
`call`
.
official/nlp/modeling/models/__init__.py
View file @
7a45b513
...
...
@@ -24,6 +24,8 @@ from official.nlp.modeling.models.bert_token_classifier import BertTokenClassifi
from
official.nlp.modeling.models.dual_encoder
import
DualEncoder
from
official.nlp.modeling.models.electra_pretrainer
import
ElectraPretrainer
from
official.nlp.modeling.models.seq2seq_transformer
import
*
from
official.nlp.modeling.models.t5
import
T5Transformer
from
official.nlp.modeling.models.t5
import
T5TransformerParams
from
official.nlp.modeling.models.xlnet
import
XLNetClassifier
from
official.nlp.modeling.models.xlnet
import
XLNetPretrainer
from
official.nlp.modeling.models.xlnet
import
XLNetSpanLabeler
official/nlp/modeling/models/t5.py
0 → 100644
View file @
7a45b513
This diff is collapsed.
Click to expand it.
official/nlp/modeling/models/t5_test.py
0 → 100644
View file @
7a45b513
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for t5."""
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
tensorflow.python.distribute
import
combinations
from
tensorflow.python.distribute
import
strategy_combinations
from
official.nlp.modeling.models
import
t5
def
_create_cache
(
batch_size
,
init_decode_length
,
num_heads
,
head_size
,
dtype
=
tf
.
float32
):
if
num_heads
is
None
:
kv_shape
=
[
batch_size
,
init_decode_length
,
head_size
]
else
:
kv_shape
=
[
batch_size
,
init_decode_length
,
num_heads
,
head_size
]
return
{
"key"
:
tf
.
zeros
(
kv_shape
,
dtype
=
dtype
),
"value"
:
tf
.
zeros
(
kv_shape
,
dtype
=
dtype
)
}
class
ModulesTest
(
tf
.
test
.
TestCase
,
parameterized
.
TestCase
):
@
parameterized
.
named_parameters
((
"bfloat16"
,
tf
.
bfloat16
),
(
"float32"
,
tf
.
float32
))
def
test_embed
(
self
,
dtype
):
l
=
t5
.
Embed
(
vocab_size
=
5
,
features
=
4
,
compute_dtype
=
dtype
,
name
=
"foo"
)
inputs
=
np
.
array
([[
2
,
3
],
[
1
,
2
]],
dtype
=
np
.
int32
)
inputs
=
tf
.
convert_to_tensor
(
inputs
)
one_hot_outputs
=
l
(
inputs
,
one_hot
=
True
)
gather_outputs
=
l
(
inputs
,
one_hot
=
False
)
self
.
assertEqual
(
one_hot_outputs
.
shape
,
(
2
,
2
,
4
))
self
.
assertLen
(
l
.
trainable_variables
,
1
)
self
.
assertAllClose
(
one_hot_outputs
,
gather_outputs
)
outputs
=
l
.
attend
(
query
=
tf
.
zeros
((
2
,
2
,
4
),
dtype
))
self
.
assertEqual
(
outputs
.
shape
,
(
2
,
2
,
5
))
# Test initializers.
l
=
t5
.
Embed
(
vocab_size
=
5
,
features
=
4
,
compute_dtype
=
dtype
,
name
=
"foo"
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Zeros
())
self
.
assertAllClose
(
l
(
inputs
),
tf
.
zeros
((
2
,
2
,
4
),
dtype
))
@
parameterized
.
named_parameters
((
"bfloat16"
,
tf
.
bfloat16
),
(
"float32"
,
tf
.
float32
))
def
test_rms_norm
(
self
,
dtype
):
l
=
t5
.
RMSNorm
(
hidden_size
=
4
,
epsilon
=
0.0
,
name
=
"foo"
)
inputs
=
tf
.
ones
((
2
,
4
),
dtype
=
dtype
)
outputs
=
l
(
inputs
)
self
.
assertAllEqual
(
l
(
inputs
),
inputs
)
self
.
assertEqual
(
outputs
.
dtype
,
dtype
)
self
.
assertLen
(
l
.
trainable_variables
,
1
)
self
.
assertIn
(
"foo/scale"
,
l
.
trainable_variables
[
0
].
name
)
@
parameterized
.
named_parameters
((
"bfloat16"
,
tf
.
bfloat16
),
(
"float32"
,
tf
.
float32
))
def
test_linear
(
self
,
dtype
):
l
=
t5
.
Linear
(
in_features
=
4
,
out_features
=
4
,
w_init
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"foo"
)
inputs
=
tf
.
ones
((
2
,
4
),
dtype
=
dtype
)
outputs
=
l
(
inputs
)
self
.
assertEqual
(
outputs
.
shape
,
inputs
.
shape
)
self
.
assertEqual
(
outputs
.
dtype
,
dtype
)
self
.
assertLen
(
l
.
trainable_variables
,
2
)
def
test_linear3d
(
self
):
batch_size
=
2
l
=
t5
.
Linear3D
(
in_features
=
4
,
out_features
=
4
,
num_heads
=
2
,
to_3d
=
True
,
w_init
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"foo"
)
inputs
=
np
.
ones
((
batch_size
,
2
,
4
),
dtype
=
np
.
float32
)
self
.
assertEqual
(
l
(
inputs
).
shape
,
(
batch_size
,
2
,
2
,
4
))
l
=
t5
.
Linear3D
(
in_features
=
2
,
out_features
=
4
,
num_heads
=
2
,
to_3d
=
False
,
w_init
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"foo"
)
inputs
=
np
.
ones
((
batch_size
,
2
,
2
,
2
),
dtype
=
np
.
float32
)
self
.
assertEqual
(
l
(
inputs
).
shape
,
(
batch_size
,
2
,
4
))
def
test_ffn
(
self
):
inputs
=
np
.
ones
((
2
,
4
),
dtype
=
np
.
float32
)
for
activation
in
[
"relu"
,
"linear"
,
"gelu"
,
"swish"
]:
l
=
t5
.
FFN
(
d_model
=
4
,
d_ff
=
8
,
use_bias
=
True
,
dropout_rate
=
0.1
,
activations
=
[
activation
],
name
=
"foo"
)
self
.
assertEqual
(
l
(
inputs
).
shape
,
inputs
.
shape
)
self
.
assertLen
(
l
.
trainable_variables
,
4
)
l
=
t5
.
FFN
(
d_model
=
4
,
d_ff
=
8
,
dropout_rate
=
0.1
,
activations
=
[
"linear"
,
"gelu"
],
name
=
"bar"
)
self
.
assertLen
(
l
.
trainable_variables
,
3
)
self
.
assertEqual
(
l
(
inputs
).
shape
,
inputs
.
shape
)
@
parameterized
.
named_parameters
((
"bfloat16"
,
tf
.
bfloat16
),
(
"float32"
,
tf
.
float32
))
def
test_relative_position
(
self
,
dtype
):
l
=
t5
.
RelativePositionEmbedding
(
num_heads
=
4
,
bidirectional
=
False
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
compute_dtype
=
dtype
,
name
=
"foo"
)
self
.
assertEqual
(
l
(
4
,
2
).
shape
,
(
1
,
4
,
4
,
2
))
l
=
t5
.
RelativePositionEmbedding
(
num_heads
=
4
,
bidirectional
=
True
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
compute_dtype
=
dtype
,
name
=
"bar"
)
outputs
=
l
(
4
,
2
)
self
.
assertEqual
(
outputs
.
shape
,
(
1
,
4
,
4
,
2
))
self
.
assertEqual
(
outputs
.
dtype
,
dtype
)
def
test_masks
(
self
):
causal_mask
=
t5
.
make_causal_mask
(
np
.
zeros
((
2
,
5
)))
self
.
assertEqual
(
causal_mask
.
shape
,
(
2
,
1
,
5
,
5
))
@
combinations
.
generate
(
combinations
.
combine
(
distribution
=
[
strategy_combinations
.
default_strategy
,
strategy_combinations
.
cloud_tpu_strategy
,
],
mode
=
"eager"
))
def
test_attention
(
self
,
distribution
):
num_heads
,
head_size
=
2
,
4
from_seq_length
,
to_seq_length
=
4
,
6
batch_size
=
2
pos_embed
=
t5
.
RelativePositionEmbedding
(
num_heads
=
4
,
bidirectional
=
False
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"pos_embed"
)
position_bias
=
pos_embed
(
from_seq_length
,
from_seq_length
)
l
=
t5
.
MultiHeadAttention
(
d_model
=
4
,
d_kv
=
2
,
num_heads
=
4
,
dropout_rate
=
0.1
)
query
=
tf
.
convert_to_tensor
(
np
.
ones
((
batch_size
,
from_seq_length
,
4
),
dtype
=
np
.
float32
))
self
.
assertEqual
(
l
(
query
,
position_bias
=
position_bias
)[
"context"
].
shape
,
query
.
shape
)
kv
=
tf
.
convert_to_tensor
(
np
.
ones
((
batch_size
,
to_seq_length
,
4
),
dtype
=
np
.
float32
))
position_bias
=
pos_embed
(
from_seq_length
,
to_seq_length
)
outputs
=
l
(
query
,
kv
=
kv
,
position_bias
=
position_bias
)
self
.
assertEqual
(
outputs
[
"context"
].
shape
,
query
.
shape
)
with
distribution
.
scope
():
l
=
t5
.
MultiHeadAttention
(
d_model
=
4
,
d_kv
=
head_size
,
num_heads
=
num_heads
,
dropout_rate
=
0.1
)
@
tf
.
function
def
step
(
inputs
):
def
_step_fn
(
inputs
):
cache
=
_create_cache
(
batch_size
,
from_seq_length
,
num_heads
,
head_size
)
mask
=
t5
.
make_causal_mask
(
tf
.
ones
((
batch_size
,
1
)))
return
l
(
query
=
inputs
,
mask
=
mask
,
cache
=
cache
,
decode_position
=
decode_position
)
outputs
=
distribution
.
run
(
_step_fn
,
args
=
(
inputs
,))
return
tf
.
nest
.
map_structure
(
distribution
.
experimental_local_results
,
outputs
)
decode_position
=
2
query
=
tf
.
convert_to_tensor
(
np
.
ones
((
2
,
1
,
4
),
dtype
=
np
.
float32
))
local_outputs
=
step
(
query
)
self
.
assertEqual
(
local_outputs
[
"context"
][
0
].
shape
,
(
2
,
1
,
4
))
self
.
assertNotEqual
(
np
.
sum
(
local_outputs
[
"cache"
][
"key"
][
0
][:,
decode_position
,
...].
numpy
()),
0.0
)
class
T5Test
(
tf
.
test
.
TestCase
,
parameterized
.
TestCase
):
@
combinations
.
generate
(
combinations
.
combine
(
distribution
=
[
strategy_combinations
.
default_strategy
,
strategy_combinations
.
cloud_tpu_strategy
,
],
mode
=
"eager"
))
def
test_attention_layers
(
self
,
distribution
):
num_heads
,
head_size
=
2
,
2
from_seq_length
=
4
# TPU decoding should pre-allocate the entire sequence.
batch_size
=
2
with
distribution
.
scope
():
pos_embed
=
t5
.
RelativePositionEmbedding
(
num_heads
=
head_size
,
bidirectional
=
False
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"pos_embed"
)
l
=
t5
.
SelfAttention
(
d_model
=
4
,
d_kv
=
head_size
,
num_heads
=
num_heads
,
dropout_rate
=
0.1
)
decode_position
=
2
@
tf
.
function
def
step
(
inputs
):
def
_step_fn
(
inputs
):
cache
=
_create_cache
(
batch_size
,
from_seq_length
,
num_heads
,
head_size
)
mask
=
t5
.
make_causal_mask
(
tf
.
ones
((
batch_size
,
1
)))
position_bias
=
pos_embed
(
from_seq_length
,
from_seq_length
)
return
l
(
hidden_states
=
inputs
,
cache
=
cache
,
attention_mask
=
mask
,
decode_position
=
decode_position
,
position_bias
=
position_bias
)
outputs
=
distribution
.
run
(
_step_fn
,
args
=
(
inputs
,))
return
tf
.
nest
.
map_structure
(
distribution
.
experimental_local_results
,
outputs
)
query
=
tf
.
convert_to_tensor
(
np
.
ones
((
2
,
1
,
4
),
dtype
=
np
.
float32
))
local_outputs
=
step
(
query
)
self
.
assertEqual
(
local_outputs
[
"layer_output"
][
0
].
shape
,
(
2
,
1
,
4
))
self
.
assertNotEqual
(
np
.
sum
(
local_outputs
[
"cache"
][
"key"
][
0
][:,
decode_position
,
:,
:].
numpy
()),
0.0
)
l
=
t5
.
CrossAttention
(
d_model
=
4
,
d_kv
=
head_size
,
num_heads
=
num_heads
,
dropout_rate
=
0.1
)
to_seq_length
=
6
query
=
tf
.
convert_to_tensor
(
np
.
ones
((
2
,
from_seq_length
,
4
),
dtype
=
np
.
float32
))
kv
=
tf
.
convert_to_tensor
(
np
.
ones
((
2
,
to_seq_length
,
4
),
dtype
=
np
.
float32
))
@
tf
.
function
def
step_cross_attn
(
inputs
):
def
_step_fn
(
inputs
):
query
,
kv
=
inputs
mask
=
t5
.
make_attention_mask
(
tf
.
ones
((
batch_size
,
from_seq_length
)),
tf
.
ones
((
batch_size
,
to_seq_length
)))
return
l
(
hidden_states
=
query
,
kv
=
kv
,
attention_mask
=
mask
)
outputs
=
distribution
.
run
(
_step_fn
,
args
=
(
inputs
,))
return
tf
.
nest
.
map_structure
(
distribution
.
experimental_local_results
,
outputs
)
local_outputs
=
step_cross_attn
((
query
,
kv
))
self
.
assertEqual
(
local_outputs
[
"layer_output"
][
0
].
shape
,
(
2
,
from_seq_length
,
4
))
def
test_encoder_block
(
self
):
batch_size
=
2
from_seq_length
=
5
d_model
=
4
l
=
t5
.
EncoderBlock
(
d_model
=
4
,
d_kv
=
3
,
num_heads
=
2
,
d_ff
=
8
,
name
=
"foo"
)
pos_embed
=
t5
.
RelativePositionEmbedding
(
num_heads
=
2
,
bidirectional
=
True
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"bar"
)
attention_mask
=
t5
.
make_attention_mask
(
tf
.
ones
((
batch_size
,
from_seq_length
)),
tf
.
ones
((
batch_size
,
from_seq_length
)))
position_bias
=
pos_embed
(
from_seq_length
,
from_seq_length
)
inputs
=
tf
.
ones
((
batch_size
,
from_seq_length
,
d_model
),
dtype
=
tf
.
float32
)
outputs
=
l
(
inputs
,
attention_mask
=
attention_mask
,
position_bias
=
position_bias
)
self
.
assertEqual
(
outputs
.
shape
,
(
batch_size
,
from_seq_length
,
d_model
))
def
test_encdec_block
(
self
):
batch_size
=
2
from_seq_length
=
5
to_seq_length
=
3
d_model
=
4
l
=
t5
.
EncDecoderBlock
(
d_model
=
4
,
d_kv
=
3
,
num_heads
=
2
,
d_ff
=
8
,
name
=
"foo"
)
pos_embed
=
t5
.
RelativePositionEmbedding
(
num_heads
=
2
,
bidirectional
=
True
,
embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
name
=
"bar"
)
encoder_decoder_mask
=
t5
.
make_attention_mask
(
tf
.
ones
((
batch_size
,
from_seq_length
)),
tf
.
ones
((
batch_size
,
to_seq_length
)))
position_bias
=
pos_embed
(
from_seq_length
,
from_seq_length
)
inputs
=
tf
.
ones
((
batch_size
,
from_seq_length
,
d_model
),
dtype
=
tf
.
float32
)
encoder_hidden_states
=
tf
.
ones
((
batch_size
,
to_seq_length
,
d_model
),
dtype
=
tf
.
float32
)
outputs
=
l
(
inputs
,
encoder_hidden_states
,
encoder_decoder_mask
=
encoder_decoder_mask
,
position_bias
=
position_bias
)
self
.
assertEqual
(
outputs
[
0
].
shape
,
(
batch_size
,
from_seq_length
,
d_model
))
@
parameterized
.
named_parameters
((
"bfloat16"
,
tf
.
bfloat16
),
(
"float32"
,
tf
.
float32
))
def
test_encoder
(
self
,
dtype
):
config
=
t5
.
T5TransformerParams
(
num_layers
=
2
,
d_model
=
4
,
d_kv
=
3
,
num_heads
=
4
,
d_ff
=
16
,
vocab_size
=
10
,
vocab_embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
relative_embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
())
encoder
=
t5
.
Encoder
(
config
,
compute_dtype
=
dtype
)
encoded
=
encoder
(
tf
.
zeros
((
4
,
8
),
dtype
=
tf
.
int32
))
self
.
assertEqual
(
encoded
.
shape
,
(
4
,
8
,
config
.
d_model
))
def
test_decoder
(
self
):
max_decode_len
=
10
config
=
t5
.
T5TransformerParams
(
num_layers
=
2
,
d_model
=
4
,
d_kv
=
3
,
num_heads
=
4
,
d_ff
=
16
,
vocab_size
=
10
,
vocab_embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
(),
relative_embeddings_initializer
=
tf
.
keras
.
initializers
.
Ones
())
decoder
=
t5
.
Decoder
(
config
)
batch_size
=
4
targets
=
tf
.
zeros
((
4
,
8
),
dtype
=
tf
.
int32
)
encoded
=
tf
.
zeros
((
4
,
8
,
config
.
d_model
),
dtype
=
tf
.
float32
)
logits
,
cache
=
decoder
(
targets
,
encoded
)
self
.
assertEqual
(
logits
.
shape
,
(
4
,
8
,
config
.
vocab_size
))
cache
=
{}
cache
[
0
]
=
_create_cache
(
batch_size
,
max_decode_len
,
config
.
num_heads
,
config
.
d_kv
)
cache
[
1
]
=
_create_cache
(
batch_size
,
max_decode_len
,
config
.
num_heads
,
config
.
d_kv
)
targets
=
tf
.
zeros
((
4
,
1
),
dtype
=
tf
.
int32
)
logits
,
cache
=
decoder
(
targets
,
encoded
,
decode_position
=
2
,
cache
=
cache
,
decode
=
True
,
max_decode_len
=
max_decode_len
)
self
.
assertEqual
(
logits
.
shape
,
(
batch_size
,
1
,
config
.
vocab_size
))
for
entry
in
cache
.
values
():
for
tensor
in
entry
.
values
():
self
.
assertNotAllEqual
(
tensor
.
numpy
()[:,
2
,
:,
:],
0.0
)
@
parameterized
.
named_parameters
(
(
"t5_10"
,
(
"relu"
,),
True
,
26
,
False
,
tf
.
float32
),
(
"t5_11"
,
(
"gelu"
,
"linear"
),
False
,
29
,
False
,
tf
.
float32
),
(
"t5_10_bfloat16"
,
(
"relu"
,),
True
,
26
,
False
,
tf
.
bfloat16
),
(
"t5_11_bfloat16"
,
(
"gelu"
,
"linear"
),
False
,
29
,
False
,
tf
.
bfloat16
),
(
"t5_10_layer_sharing"
,
(
"relu"
,),
True
,
26
,
True
,
tf
.
float32
),
(
"t5_11_layer_sharing"
,
(
"gelu"
,
"linear"
),
False
,
29
,
True
,
tf
.
float32
),
(
"t5_10_bfloat16_layer_sharing"
,
(
"relu"
,),
True
,
26
,
True
,
tf
.
bfloat16
),
(
"t5_11_bfloat16_layer_sharing"
,
(
"gelu"
,
"linear"
),
False
,
29
,
True
,
tf
.
bfloat16
))
def
test_transformer
(
self
,
ffn_activations
,
logits_via_embedding
,
expect_num_variables
,
layer_sharing
,
dtype
):
max_decode_len
=
10
config
=
t5
.
T5TransformerParams
(
num_layers
=
1
,
d_model
=
8
,
d_kv
=
4
,
num_heads
=
4
,
d_ff
=
32
,
vocab_size
=
10
,
shared_embedding
=
True
,
layer_sharing
=
layer_sharing
,
ffn_activations
=
ffn_activations
,
logits_via_embedding
=
logits_via_embedding
)
transformer
=
t5
.
T5Transformer
(
config
,
compute_dtype
=
dtype
)
self
.
assertLen
(
transformer
.
trainable_variables
,
expect_num_variables
)
inputs
=
tf
.
convert_to_tensor
(
np
.
array
([[
2
,
2
,
1
,
3
,
1
,
0
],
[
3
,
3
,
1
,
2
,
2
,
1
]]))
segments
=
tf
.
convert_to_tensor
(
np
.
array
([[
1
,
1
,
1
,
2
,
2
,
0
],
[
1
,
1
,
1
,
2
,
2
,
2
]]))
outputs
=
transformer
(
encoder_input_tokens
=
inputs
,
decoder_input_tokens
=
inputs
,
decoder_target_tokens
=
inputs
,
encoder_segment_ids
=
segments
,
decoder_segment_ids
=
segments
)
cache
=
{}
batch_size
=
2
cache
[
0
]
=
_create_cache
(
batch_size
,
max_decode_len
,
config
.
num_heads
,
config
.
d_kv
,
dtype
=
dtype
)
outputs
=
transformer
.
decode
(
encoder_input_tokens
=
inputs
,
encoded
=
outputs
[
"encoded"
],
decoder_target_tokens
=
tf
.
ones
((
batch_size
,
1
),
dtype
=
tf
.
int32
),
decode_position
=
1
,
decode
=
True
,
max_decode_len
=
max_decode_len
,
cache
=
cache
)
self
.
assertEqual
(
outputs
[
"logits"
].
shape
,
(
batch_size
,
1
,
config
.
vocab_size
))
for
v
in
transformer
.
trainable_variables
:
print
(
v
.
name
,
v
.
shape
)
self
.
assertEqual
(
v
.
dtype
,
tf
.
float32
)
@
parameterized
.
named_parameters
(
(
"t5_10"
,
(
"relu"
,),
True
,
39
,
tf
.
float32
,
2
),
(
"t5_10_bfloat16"
,
(
"relu"
,),
True
,
39
,
tf
.
bfloat16
,
2
))
def
test_transformer_different_num_decoder_layers
(
self
,
ffn_activations
,
logits_via_embedding
,
expect_num_variables
,
dtype
,
num_decoder_layers
):
max_decode_len
=
10
config
=
t5
.
T5TransformerParams
(
num_decoder_layers
=
num_decoder_layers
,
num_layers
=
1
,
d_model
=
8
,
d_kv
=
4
,
num_heads
=
4
,
d_ff
=
32
,
vocab_size
=
10
,
shared_embedding
=
True
,
ffn_activations
=
ffn_activations
,
logits_via_embedding
=
logits_via_embedding
)
transformer
=
t5
.
T5Transformer
(
config
,
compute_dtype
=
dtype
)
self
.
assertLen
(
transformer
.
trainable_variables
,
expect_num_variables
)
inputs
=
tf
.
convert_to_tensor
(
np
.
array
([[
2
,
2
,
1
,
3
,
1
,
0
],
[
3
,
3
,
1
,
2
,
2
,
1
]]))
segments
=
tf
.
convert_to_tensor
(
np
.
array
([[
1
,
1
,
1
,
2
,
2
,
0
],
[
1
,
1
,
1
,
2
,
2
,
2
]]))
outputs
=
transformer
(
encoder_input_tokens
=
inputs
,
decoder_input_tokens
=
inputs
,
decoder_target_tokens
=
inputs
,
encoder_segment_ids
=
segments
,
decoder_segment_ids
=
segments
)
cache
=
{}
batch_size
=
2
for
i
in
range
(
num_decoder_layers
):
cache
[
i
]
=
_create_cache
(
batch_size
,
max_decode_len
,
config
.
num_heads
,
config
.
d_kv
,
dtype
=
dtype
)
outputs
=
transformer
.
decode
(
encoder_input_tokens
=
inputs
,
encoded
=
outputs
[
"encoded"
],
decoder_target_tokens
=
tf
.
ones
((
batch_size
,
1
),
dtype
=
tf
.
int32
),
decode_position
=
1
,
decode
=
True
,
max_decode_len
=
max_decode_len
,
cache
=
cache
)
self
.
assertEqual
(
outputs
[
"logits"
].
shape
,
(
batch_size
,
1
,
config
.
vocab_size
))
for
v
in
transformer
.
trainable_variables
:
print
(
v
.
name
,
v
.
shape
)
self
.
assertEqual
(
v
.
dtype
,
tf
.
float32
)
if
__name__
==
"__main__"
:
tf
.
test
.
main
()
official/nlp/modeling/networks/funnel_transformer.py
View file @
7a45b513
...
...
@@ -14,6 +14,7 @@
"""Funnel Transformer network."""
# pylint: disable=g-classes-have-attributes
from
typing
import
Union
,
Sequence
from
absl
import
logging
import
numpy
as
np
...
...
@@ -21,6 +22,10 @@ import tensorflow as tf
from
official.nlp.modeling
import
layers
_MAX
=
'max'
_AVG
=
'avg'
_TRUNCATED_AVG
=
'truncated_avg'
def
_pool_and_concat
(
mask
,
unpool_length
:
int
,
strides
:
Union
[
Sequence
[
int
],
int
],
...
...
@@ -63,6 +68,94 @@ def _pool_and_concat(mask, unpool_length: int, strides: Union[Sequence[int],
return
mask
def
_create_truncated_avg_transforms
(
seq_length
:
int
,
pool_strides
:
Sequence
[
int
]):
"""Computes pooling transforms.
The pooling_transform is of shape [seq_length,
seq_length//pool_stride] and
pooling_transform[i,j] = 1.0/pool_stride if i//pool_stride == j
0.0 otherwise.
It's in essense average pooling but truncate the final window if it
seq_length % pool_stride != 0.
For seq_length==6 and pool_stride==2, it is
[[ 0.5, 0.0, 0.0 ],
[ 0.5, 0.0, 0.0 ],
[ 0.0, 0.5, 0.0 ],
[ 0.0, 0.5, 0.0 ],
[ 0.0, 0.0, 0.5 ],
[ 0.0, 0.0, 0.5 ]]
Args:
seq_length: int, sequence length.
pool_strides: Sequence of pooling strides for each layer.
Returns:
pooling_transforms: Sequence of pooling transforms (Tensors) for each layer.
"""
pooling_transforms
=
[]
for
pool_stride
in
pool_strides
:
if
pool_stride
==
1
:
pooling_transforms
.
append
(
None
)
else
:
pooled_seq_length
=
seq_length
//
pool_stride
pfac
,
sl
,
psl
=
pool_stride
,
seq_length
,
pooled_seq_length
transform
=
[[
1.0
if
(
i
//
pfac
)
==
j
else
0.0
for
j
in
range
(
psl
)]
for
i
in
range
(
sl
)]
transform
=
tf
.
constant
(
transform
,
dtype
=
tf
.
keras
.
mixed_precision
.
global_policy
().
compute_dtype
)
pooling_transforms
.
append
(
transform
/
pool_stride
)
seq_length
=
pooled_seq_length
return
pooling_transforms
def
_create_truncated_avg_masks
(
input_mask
:
tf
.
Tensor
,
pool_strides
:
Sequence
[
int
],
transforms
:
Sequence
[
tf
.
Tensor
]):
"""Computes attention masks.
For [1,1,1,0,0]
Args:
input_mask: Tensor of shape [batch_size, seq_length].
pool_strides: Sequence of pooling strides for each layer.
transforms: Sequnce of off-diagonal matrices filling with 0.0 and
1/pool_stride.
Returns:
attention_masks: Sequence of attention masks for each layer.
"""
def
create_2d_mask
(
from_length
,
mask
):
return
tf
.
einsum
(
'F,BT->BFT'
,
tf
.
ones
([
from_length
],
dtype
=
mask
.
dtype
),
mask
)
attention_masks
=
[]
seq_length
=
tf
.
shape
(
input_mask
)[
-
1
]
layer_mask
=
tf
.
cast
(
input_mask
,
dtype
=
tf
.
keras
.
mixed_precision
.
global_policy
().
compute_dtype
)
for
pool_stride
,
transform
in
zip
(
pool_strides
,
transforms
):
if
pool_stride
==
1
:
attention_masks
.
append
(
create_2d_mask
(
seq_length
,
layer_mask
))
else
:
pooled_seq_length
=
seq_length
//
pool_stride
attention_masks
.
append
(
create_2d_mask
(
pooled_seq_length
,
layer_mask
))
layer_mask
=
tf
.
cast
(
tf
.
einsum
(
'BF,FT->BT'
,
layer_mask
,
transform
)
>
0.0
,
dtype
=
layer_mask
.
dtype
)
seq_length
=
pooled_seq_length
del
seq_length
return
attention_masks
@
tf
.
keras
.
utils
.
register_keras_serializable
(
package
=
'Text'
)
class
FunnelTransformerEncoder
(
tf
.
keras
.
layers
.
Layer
):
"""Funnel Transformer-based encoder network.
...
...
@@ -90,7 +183,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer):
dropout.
attention_dropout: The dropout rate to use for the attention layers within
the transformer layers.
pool_type: Pooling type. Choose from ['max', 'avg'].
pool_type: Pooling type. Choose from ['max', 'avg'
, 'truncated_avg'
].
pool_stride: An int or a list of ints. Pooling stride(s) to compress the
sequence length. If set to int, each layer will have the same stride size.
If set to list, the number of elements needs to match num_layers.
...
...
@@ -124,7 +217,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer):
inner_activation
=
lambda
x
:
tf
.
keras
.
activations
.
gelu
(
x
,
approximate
=
True
),
output_dropout
=
0.1
,
attention_dropout
=
0.1
,
pool_type
=
'max'
,
pool_type
=
_MAX
,
pool_stride
=
2
,
unpool_length
=
0
,
initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
...
...
@@ -207,23 +300,33 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer):
raise
ValueError
(
'Lengths of pool_stride and num_layers are not equal.'
)
pool_strides
=
pool_stride
# TODO(crickwu): explore tf.keras.layers.serialize method.
if
pool_type
==
'max'
:
if
pool_type
==
_MAX
:
pool_cls
=
tf
.
keras
.
layers
.
MaxPooling1D
elif
pool_type
==
'avg'
:
elif
pool_type
==
_AVG
:
pool_cls
=
tf
.
keras
.
layers
.
AveragePooling1D
elif
pool_type
==
_TRUNCATED_AVG
:
# TODO(b/203665205): unpool_length should be implemented.
if
unpool_length
!=
0
:
raise
ValueError
(
'unpool_length is not supported by truncated_avg now.'
)
# Compute the attention masks and pooling transforms.
self
.
_pooling_transforms
=
_create_truncated_avg_transforms
(
max_sequence_length
,
pool_strides
)
else
:
raise
ValueError
(
'pool_type not supported.'
)
self
.
_att_input_pool_layers
=
[]
for
layer_pool_stride
in
pool_strides
:
att_input_pool_layer
=
pool_cls
(
pool_size
=
layer_pool_stride
,
strides
=
layer_pool_stride
,
padding
=
'same'
,
name
=
'att_input_pool_layer'
)
self
.
_att_input_pool_layers
.
append
(
att_input_pool_layer
)
if
pool_type
in
(
_MAX
,
_AVG
):
self
.
_att_input_pool_layers
=
[]
for
layer_pool_stride
in
pool_strides
:
att_input_pool_layer
=
pool_cls
(
pool_size
=
layer_pool_stride
,
strides
=
layer_pool_stride
,
padding
=
'same'
,
name
=
'att_input_pool_layer'
)
self
.
_att_input_pool_layers
.
append
(
att_input_pool_layer
)
self
.
_pool_strides
=
pool_strides
# This is a list here.
self
.
_unpool_length
=
unpool_length
self
.
_pool_type
=
pool_type
self
.
_config
=
{
'vocab_size'
:
vocab_size
,
...
...
@@ -280,39 +383,65 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer):
encoder_outputs
=
[]
x
=
embeddings
# TODO(b/195972228): attention_mask can be co-generated with pooling.
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
strides
=
self
.
_pool_strides
[
0
],
axes
=
[
1
])
for
i
,
layer
in
enumerate
(
self
.
_transformer_layers
):
# Bypass no pooling cases.
if
self
.
_pool_strides
[
i
]
==
1
:
x
=
layer
([
x
,
x
,
attention_mask
])
else
:
# Pools layer for compressing the query length.
pooled_inputs
=
self
.
_att_input_pool_layers
[
i
](
x
[:,
self
.
_unpool_length
:,
:])
query_inputs
=
tf
.
concat
(
values
=
(
tf
.
cast
(
x
[:,
:
self
.
_unpool_length
,
:],
dtype
=
pooled_inputs
.
dtype
),
pooled_inputs
),
axis
=
1
)
x
=
layer
([
query_inputs
,
x
,
attention_mask
])
# Pools the corresponding attention_mask.
if
i
<
len
(
self
.
_transformer_layers
)
-
1
:
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
strides
=
[
self
.
_pool_strides
[
i
+
1
],
self
.
_pool_strides
[
i
]],
axes
=
[
1
,
2
])
encoder_outputs
.
append
(
x
)
if
self
.
_pool_type
in
(
_MAX
,
_AVG
):
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
strides
=
self
.
_pool_strides
[
0
],
axes
=
[
1
])
for
i
,
layer
in
enumerate
(
self
.
_transformer_layers
):
# Bypass no pooling cases.
if
self
.
_pool_strides
[
i
]
==
1
:
x
=
layer
([
x
,
x
,
attention_mask
])
else
:
# Pools layer for compressing the query length.
pooled_inputs
=
self
.
_att_input_pool_layers
[
i
](
x
[:,
self
.
_unpool_length
:,
:])
query_inputs
=
tf
.
concat
(
values
=
(
tf
.
cast
(
x
[:,
:
self
.
_unpool_length
,
:],
dtype
=
pooled_inputs
.
dtype
),
pooled_inputs
),
axis
=
1
)
x
=
layer
([
query_inputs
,
x
,
attention_mask
])
# Pools the corresponding attention_mask.
if
i
<
len
(
self
.
_transformer_layers
)
-
1
:
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
strides
=
[
self
.
_pool_strides
[
i
+
1
],
self
.
_pool_strides
[
i
]],
axes
=
[
1
,
2
])
encoder_outputs
.
append
(
x
)
elif
self
.
_pool_type
==
_TRUNCATED_AVG
:
attention_masks
=
_create_truncated_avg_masks
(
mask
,
self
.
_pool_strides
,
self
.
_pooling_transforms
)
for
i
,
layer
in
enumerate
(
self
.
_transformer_layers
):
attention_mask
=
attention_masks
[
i
]
# Bypass no pooling cases.
if
self
.
_pool_strides
[
i
]
==
1
:
x
=
layer
([
x
,
x
,
attention_mask
])
else
:
pooled_inputs
=
tf
.
einsum
(
'BFD,FT->BTD'
,
tf
.
cast
(
x
[:,
self
.
_unpool_length
:,
:],
tf
.
keras
.
mixed_precision
.
global_policy
().
compute_dtype
),
# extra casting for faster mixed computation.
self
.
_pooling_transforms
[
i
])
query_inputs
=
tf
.
concat
(
values
=
(
tf
.
cast
(
x
[:,
:
self
.
_unpool_length
,
:],
dtype
=
pooled_inputs
.
dtype
),
pooled_inputs
),
axis
=
1
)
x
=
layer
([
query_inputs
,
x
,
attention_mask
])
encoder_outputs
.
append
(
x
)
last_encoder_output
=
encoder_outputs
[
-
1
]
first_token_tensor
=
last_encoder_output
[:,
0
,
:]
pooled_output
=
self
.
_pooler_layer
(
first_token_tensor
)
return
dict
(
word_embeddings
=
word_embeddings
,
embedding_output
=
embeddings
,
sequence_output
=
encoder_outputs
[
-
1
],
pooled_output
=
pooled_output
,
encoder_outputs
=
encoder_outputs
)
...
...
official/nlp/modeling/networks/funnel_transformer_test.py
View file @
7a45b513
...
...
@@ -38,6 +38,8 @@ class FunnelTransformerEncoderTest(parameterized.TestCase, tf.test.TestCase):
tf
.
keras
.
mixed_precision
.
set_global_policy
(
"float32"
)
@
parameterized
.
named_parameters
(
(
"mix_truncated_avg"
,
"mixed_float16"
,
tf
.
float16
,
"truncated_avg"
),
(
"float32_truncated_avg"
,
"float32"
,
tf
.
float32
,
"truncated_avg"
),
(
"mix_max"
,
"mixed_float16"
,
tf
.
float16
,
"max"
),
(
"float32_max"
,
"float32"
,
tf
.
float32
,
"max"
),
(
"mix_avg"
,
"mixed_float16"
,
tf
.
float16
,
"avg"
),
...
...
@@ -57,6 +59,7 @@ class FunnelTransformerEncoderTest(parameterized.TestCase, tf.test.TestCase):
num_layers
=
num_layers
,
pool_stride
=
pool_stride
,
pool_type
=
pool_type
,
max_sequence_length
=
sequence_length
,
unpool_length
=
0
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
...
...
@@ -71,8 +74,14 @@ class FunnelTransformerEncoderTest(parameterized.TestCase, tf.test.TestCase):
self
.
assertIsInstance
(
test_network
.
pooler_layer
,
tf
.
keras
.
layers
.
Dense
)
# Stride=2 compresses sequence length to half the size at each layer.
# This configuration gives each layer of seq length: 21->11->6->3.
expected_data_shape
=
[
None
,
3
,
hidden_size
]
# For pool_type = max or avg,
# this configuration gives each layer of seq length: 21->11->6->3.
# For pool_type = truncated_avg,
# seq length: 21->10->5->2.
if
pool_type
in
[
"max"
,
"avg"
]:
expected_data_shape
=
[
None
,
3
,
hidden_size
]
else
:
expected_data_shape
=
[
None
,
2
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
...
...
official/nlp/projects/teams/experiments/base/glue_mnli.yaml
View file @
7a45b513
...
...
@@ -16,6 +16,7 @@ task:
seq_length
:
128
trainer
:
checkpoint_interval
:
1000
continuous_eval_timeout
:
7200
optimizer_config
:
learning_rate
:
polynomial
:
...
...
official/nlp/projects/teams/experiments/base/squad_v1.yaml
View file @
7a45b513
...
...
@@ -23,6 +23,7 @@ task:
vocab_file
:
'
'
trainer
:
checkpoint_interval
:
500
continuous_eval_timeout
:
7200
max_to_keep
:
5
optimizer_config
:
learning_rate
:
...
...
official/nlp/projects/teams/experiments/base/squad_v2.yaml
View file @
7a45b513
...
...
@@ -23,6 +23,7 @@ task:
vocab_file
:
'
'
trainer
:
checkpoint_interval
:
500
continuous_eval_timeout
:
7200
max_to_keep
:
5
optimizer_config
:
learning_rate
:
...
...
Prev
1
2
3
4
5
6
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment