Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
1f8b5b27
Unverified
Commit
1f8b5b27
authored
Sep 03, 2021
by
Simon Geisler
Committed by
GitHub
Sep 03, 2021
Browse files
Merge branch 'master' into master
parents
0eeeaf98
8fcf177e
Changes
99
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1543 additions
and
43 deletions
+1543
-43
CODEOWNERS
CODEOWNERS
+3
-2
official/README.md
official/README.md
+7
-2
official/modeling/grad_utils.py
official/modeling/grad_utils.py
+0
-0
official/modeling/hyperparams/base_config.py
official/modeling/hyperparams/base_config.py
+5
-7
official/nlp/bert/model_training_utils.py
official/nlp/bert/model_training_utils.py
+1
-1
official/nlp/data/squad_lib_sp.py
official/nlp/data/squad_lib_sp.py
+1
-1
official/nlp/modeling/layers/cls_head.py
official/nlp/modeling/layers/cls_head.py
+35
-6
official/nlp/modeling/layers/cls_head_test.py
official/nlp/modeling/layers/cls_head_test.py
+5
-0
official/nlp/modeling/networks/classification.py
official/nlp/modeling/networks/classification.py
+3
-0
official/nlp/modeling/networks/encoder_scaffold.py
official/nlp/modeling/networks/encoder_scaffold.py
+29
-6
official/nlp/modeling/networks/encoder_scaffold_test.py
official/nlp/modeling/networks/encoder_scaffold_test.py
+92
-0
official/nlp/modeling/networks/funnel_transformer.py
official/nlp/modeling/networks/funnel_transformer.py
+311
-0
official/nlp/modeling/networks/funnel_transformer_test.py
official/nlp/modeling/networks/funnel_transformer_test.py
+260
-0
official/nlp/projects/teams/teams.py
official/nlp/projects/teams/teams.py
+104
-0
official/nlp/projects/teams/teams_pretrainer.py
official/nlp/projects/teams/teams_pretrainer.py
+463
-0
official/nlp/projects/teams/teams_pretrainer_test.py
official/nlp/projects/teams/teams_pretrainer_test.py
+188
-0
official/nlp/projects/triviaqa/inputs.py
official/nlp/projects/triviaqa/inputs.py
+5
-5
official/nlp/serving/serving_modules.py
official/nlp/serving/serving_modules.py
+24
-10
official/pip_package/setup.py
official/pip_package/setup.py
+1
-0
official/recommendation/ranking/README.md
official/recommendation/ranking/README.md
+6
-3
No files found.
CODEOWNERS
View file @
1f8b5b27
*
@tensorflow/tf-garden-team
@tensorflow/tf-model-garden-team
* @tensorflow/tf-model-garden-team
/official/ @rachellj218 @saberkun @jaeyounkim
/official/ @rachellj218 @saberkun @jaeyounkim
/official/nlp/ @saberkun @lehougoogle @rachellj218 @jaeyounkim
/official/nlp/ @saberkun @lehougoogle @rachellj218 @jaeyounkim
/official/recommendation/ranking/ @gagika
/official/recommendation/ranking/ @gagika
/official/vision/ @xianzhidu @yeqingli @arashwan @saberkun @rachellj218 @jaeyounkim
/official/vision/ @xianzhidu @yeqingli @arashwan @saberkun @rachellj218 @jaeyounkim
/official/vision/beta/projects/assemblenet/ @mryoo
/official/vision/beta/projects/assemblenet/ @mryoo
@yeqingli
/official/vision/beta/projects/deepmac_maskrcnn/ @vighneshbirodkar
/official/vision/beta/projects/deepmac_maskrcnn/ @vighneshbirodkar
/official/vision/beta/projects/movinet/ @hyperparticle @yuanliangzhe @yeqingli
/official/vision/beta/projects/movinet/ @hyperparticle @yuanliangzhe @yeqingli
/official/vision/beta/projects/simclr/ @luotigerlsx @chentingpc @saxenasaurabh
/official/vision/beta/projects/simclr/ @luotigerlsx @chentingpc @saxenasaurabh
/official/vision/beta/projects/video_ssl/ @richardaecn @yeqingli
/research/adversarial_text/ @rsepassi @a-dai
/research/adversarial_text/ @rsepassi @a-dai
/research/attention_ocr/ @xavigibert
/research/attention_ocr/ @xavigibert
/research/audioset/ @plakal @dpwe
/research/audioset/ @plakal @dpwe
...
...
official/README.md
View file @
1f8b5b27
...
@@ -82,9 +82,9 @@ built from the
...
@@ -82,9 +82,9 @@ built from the
as tagged branches or
[
downloadable releases
](
https://github.com/tensorflow/models/releases
)
.
as tagged branches or
[
downloadable releases
](
https://github.com/tensorflow/models/releases
)
.
*
Model repository version numbers match the target TensorFlow release,
*
Model repository version numbers match the target TensorFlow release,
such that
such that
[
release v2.
2
.0
](
https://github.com/tensorflow/models/releases/tag/v2.
2
.0
)
[
release v2.
5
.0
](
https://github.com/tensorflow/models/releases/tag/v2.
5
.0
)
are compatible with
are compatible with
[
TensorFlow v2.
2
.0
](
https://github.com/tensorflow/tensorflow/releases/tag/v2.
2
.0
)
.
[
TensorFlow v2.
5
.0
](
https://github.com/tensorflow/tensorflow/releases/tag/v2.
5
.0
)
.
Please follow the below steps before running models in this repository.
Please follow the below steps before running models in this repository.
...
@@ -98,6 +98,11 @@ upgrade your TensorFlow to [the latest TensorFlow 2](https://www.tensorflow.org/
...
@@ -98,6 +98,11 @@ upgrade your TensorFlow to [the latest TensorFlow 2](https://www.tensorflow.org/
pip3
install
tf-nightly
pip3
install
tf-nightly
```
```
*
Python 3.7+
Our integration tests run with Python 3.7. Although Python 3.6 should work, we
don't recommend earlier versions.
### Installation
### Installation
#### Method 1: Install the TensorFlow Model Garden pip package
#### Method 1: Install the TensorFlow Model Garden pip package
...
...
official/
staging/train
ing/grad_utils.py
→
official/
model
ing/grad_utils.py
View file @
1f8b5b27
File moved
official/modeling/hyperparams/base_config.py
View file @
1f8b5b27
...
@@ -49,6 +49,11 @@ class Config(params_dict.ParamsDict):
...
@@ -49,6 +49,11 @@ class Config(params_dict.ParamsDict):
default_params
:
dataclasses
.
InitVar
[
Optional
[
Mapping
[
str
,
Any
]]]
=
None
default_params
:
dataclasses
.
InitVar
[
Optional
[
Mapping
[
str
,
Any
]]]
=
None
restrictions
:
dataclasses
.
InitVar
[
Optional
[
List
[
str
]]]
=
None
restrictions
:
dataclasses
.
InitVar
[
Optional
[
List
[
str
]]]
=
None
def
__post_init__
(
self
,
default_params
,
restrictions
):
super
().
__init__
(
default_params
=
default_params
,
restrictions
=
restrictions
)
@
classmethod
@
classmethod
def
_isvalidsequence
(
cls
,
v
):
def
_isvalidsequence
(
cls
,
v
):
"""Check if the input values are valid sequences.
"""Check if the input values are valid sequences.
...
@@ -140,13 +145,6 @@ class Config(params_dict.ParamsDict):
...
@@ -140,13 +145,6 @@ class Config(params_dict.ParamsDict):
else
subconfig_type
)
else
subconfig_type
)
return
subconfig_type
return
subconfig_type
def
__post_init__
(
self
,
default_params
,
restrictions
,
*
args
,
**
kwargs
):
super
().
__init__
(
default_params
=
default_params
,
restrictions
=
restrictions
,
*
args
,
**
kwargs
)
def
_set
(
self
,
k
,
v
):
def
_set
(
self
,
k
,
v
):
"""Overrides same method in ParamsDict.
"""Overrides same method in ParamsDict.
...
...
official/nlp/bert/model_training_utils.py
View file @
1f8b5b27
...
@@ -22,7 +22,7 @@ from absl import logging
...
@@ -22,7 +22,7 @@ from absl import logging
import
tensorflow
as
tf
import
tensorflow
as
tf
from
tensorflow.python.util
import
deprecation
from
tensorflow.python.util
import
deprecation
from
official.common
import
distribute_utils
from
official.common
import
distribute_utils
from
official.
staging.train
ing
import
grad_utils
from
official.
model
ing
import
grad_utils
_SUMMARY_TXT
=
'training_summary.txt'
_SUMMARY_TXT
=
'training_summary.txt'
_MIN_SUMMARY_STEPS
=
10
_MIN_SUMMARY_STEPS
=
10
...
...
official/nlp/data/squad_lib_sp.py
View file @
1f8b5b27
...
@@ -175,7 +175,7 @@ def _convert_index(index, pos, m=None, is_start=True):
...
@@ -175,7 +175,7 @@ def _convert_index(index, pos, m=None, is_start=True):
front
-=
1
front
-=
1
assert
index
[
front
]
is
not
None
or
index
[
rear
]
is
not
None
assert
index
[
front
]
is
not
None
or
index
[
rear
]
is
not
None
if
index
[
front
]
is
None
:
if
index
[
front
]
is
None
:
if
index
[
rear
]
>=
1
:
if
index
[
rear
]
>=
1
:
# pytype: disable=unsupported-operands
if
is_start
:
if
is_start
:
return
0
return
0
else
:
else
:
...
...
official/nlp/modeling/layers/cls_head.py
View file @
1f8b5b27
...
@@ -59,19 +59,33 @@ class ClassificationHead(tf.keras.layers.Layer):
...
@@ -59,19 +59,33 @@ class ClassificationHead(tf.keras.layers.Layer):
activation
=
self
.
activation
,
activation
=
self
.
activation
,
kernel_initializer
=
self
.
initializer
,
kernel_initializer
=
self
.
initializer
,
name
=
"pooler_dense"
)
name
=
"pooler_dense"
)
self
.
dropout
=
tf
.
keras
.
layers
.
Dropout
(
rate
=
self
.
dropout_rate
)
self
.
dropout
=
tf
.
keras
.
layers
.
Dropout
(
rate
=
self
.
dropout_rate
)
self
.
out_proj
=
tf
.
keras
.
layers
.
Dense
(
self
.
out_proj
=
tf
.
keras
.
layers
.
Dense
(
units
=
num_classes
,
kernel_initializer
=
self
.
initializer
,
name
=
"logits"
)
units
=
num_classes
,
kernel_initializer
=
self
.
initializer
,
name
=
"logits"
)
def
call
(
self
,
features
):
def
call
(
self
,
features
:
tf
.
Tensor
,
only_project
:
bool
=
False
):
"""Implements call().
Args:
features: a rank-3 Tensor when self.inner_dim is specified, otherwise
it is a rank-2 Tensor.
only_project: a boolean. If True, we return the intermediate Tensor
before projecting to class logits.
Returns:
a Tensor, if only_project is True, shape= [batch size, hidden size].
If only_project is False, shape= [batch size, num classes].
"""
if
not
self
.
inner_dim
:
if
not
self
.
inner_dim
:
x
=
features
x
=
features
else
:
else
:
x
=
features
[:,
self
.
cls_token_idx
,
:]
# take <CLS> token.
x
=
features
[:,
self
.
cls_token_idx
,
:]
# take <CLS> token.
x
=
self
.
dense
(
x
)
x
=
self
.
dense
(
x
)
x
=
self
.
dropout
(
x
)
if
only_project
:
return
x
x
=
self
.
dropout
(
x
)
x
=
self
.
out_proj
(
x
)
x
=
self
.
out_proj
(
x
)
return
x
return
x
...
@@ -134,7 +148,7 @@ class MultiClsHeads(tf.keras.layers.Layer):
...
@@ -134,7 +148,7 @@ class MultiClsHeads(tf.keras.layers.Layer):
activation
=
self
.
activation
,
activation
=
self
.
activation
,
kernel_initializer
=
self
.
initializer
,
kernel_initializer
=
self
.
initializer
,
name
=
"pooler_dense"
)
name
=
"pooler_dense"
)
self
.
dropout
=
tf
.
keras
.
layers
.
Dropout
(
rate
=
self
.
dropout_rate
)
self
.
dropout
=
tf
.
keras
.
layers
.
Dropout
(
rate
=
self
.
dropout_rate
)
self
.
out_projs
=
[]
self
.
out_projs
=
[]
for
name
,
num_classes
in
cls_list
:
for
name
,
num_classes
in
cls_list
:
self
.
out_projs
.
append
(
self
.
out_projs
.
append
(
...
@@ -142,13 +156,28 @@ class MultiClsHeads(tf.keras.layers.Layer):
...
@@ -142,13 +156,28 @@ class MultiClsHeads(tf.keras.layers.Layer):
units
=
num_classes
,
kernel_initializer
=
self
.
initializer
,
units
=
num_classes
,
kernel_initializer
=
self
.
initializer
,
name
=
name
))
name
=
name
))
def
call
(
self
,
features
):
def
call
(
self
,
features
:
tf
.
Tensor
,
only_project
:
bool
=
False
):
"""Implements call().
Args:
features: a rank-3 Tensor when self.inner_dim is specified, otherwise
it is a rank-2 Tensor.
only_project: a boolean. If True, we return the intermediate Tensor
before projecting to class logits.
Returns:
If only_project is True, a Tensor with shape= [batch size, hidden size].
If only_project is False, a dictionary of Tensors.
"""
if
not
self
.
inner_dim
:
if
not
self
.
inner_dim
:
x
=
features
x
=
features
else
:
else
:
x
=
features
[:,
self
.
cls_token_idx
,
:]
# take <CLS> token.
x
=
features
[:,
self
.
cls_token_idx
,
:]
# take <CLS> token.
x
=
self
.
dense
(
x
)
x
=
self
.
dense
(
x
)
x
=
self
.
dropout
(
x
)
if
only_project
:
return
x
x
=
self
.
dropout
(
x
)
outputs
=
{}
outputs
=
{}
for
proj_layer
in
self
.
out_projs
:
for
proj_layer
in
self
.
out_projs
:
...
...
official/nlp/modeling/layers/cls_head_test.py
View file @
1f8b5b27
...
@@ -39,6 +39,8 @@ class ClassificationHeadTest(tf.test.TestCase, parameterized.TestCase):
...
@@ -39,6 +39,8 @@ class ClassificationHeadTest(tf.test.TestCase, parameterized.TestCase):
self
.
assertAllClose
(
output
,
[[
0.
,
0.
],
[
0.
,
0.
]])
self
.
assertAllClose
(
output
,
[[
0.
,
0.
],
[
0.
,
0.
]])
self
.
assertSameElements
(
test_layer
.
checkpoint_items
.
keys
(),
self
.
assertSameElements
(
test_layer
.
checkpoint_items
.
keys
(),
[
"pooler_dense"
])
[
"pooler_dense"
])
outputs
=
test_layer
(
features
,
only_project
=
True
)
self
.
assertEqual
(
outputs
.
shape
,
(
2
,
5
))
def
test_layer_serialization
(
self
):
def
test_layer_serialization
(
self
):
layer
=
cls_head
.
ClassificationHead
(
10
,
2
)
layer
=
cls_head
.
ClassificationHead
(
10
,
2
)
...
@@ -71,6 +73,9 @@ class MultiClsHeadsTest(tf.test.TestCase, parameterized.TestCase):
...
@@ -71,6 +73,9 @@ class MultiClsHeadsTest(tf.test.TestCase, parameterized.TestCase):
self
.
assertSameElements
(
test_layer
.
checkpoint_items
.
keys
(),
self
.
assertSameElements
(
test_layer
.
checkpoint_items
.
keys
(),
[
"pooler_dense"
,
"foo"
,
"bar"
])
[
"pooler_dense"
,
"foo"
,
"bar"
])
outputs
=
test_layer
(
features
,
only_project
=
True
)
self
.
assertEqual
(
outputs
.
shape
,
(
2
,
5
))
def
test_layer_serialization
(
self
):
def
test_layer_serialization
(
self
):
cls_list
=
[(
"foo"
,
2
),
(
"bar"
,
3
)]
cls_list
=
[(
"foo"
,
2
),
(
"bar"
,
3
)]
test_layer
=
cls_head
.
MultiClsHeads
(
inner_dim
=
5
,
cls_list
=
cls_list
)
test_layer
=
cls_head
.
MultiClsHeads
(
inner_dim
=
5
,
cls_list
=
cls_list
)
...
...
official/nlp/modeling/networks/classification.py
View file @
1f8b5b27
...
@@ -16,6 +16,7 @@
...
@@ -16,6 +16,7 @@
# pylint: disable=g-classes-have-attributes
# pylint: disable=g-classes-have-attributes
import
collections
import
collections
import
tensorflow
as
tf
import
tensorflow
as
tf
from
tensorflow.python.util
import
deprecation
@
tf
.
keras
.
utils
.
register_keras_serializable
(
package
=
'Text'
)
@
tf
.
keras
.
utils
.
register_keras_serializable
(
package
=
'Text'
)
...
@@ -39,6 +40,8 @@ class Classification(tf.keras.Model):
...
@@ -39,6 +40,8 @@ class Classification(tf.keras.Model):
`predictions`.
`predictions`.
"""
"""
@
deprecation
.
deprecated
(
None
,
'Classification as a network is deprecated. '
'Please use the layers.ClassificationHead instead.'
)
def
__init__
(
self
,
def
__init__
(
self
,
input_width
,
input_width
,
num_classes
,
num_classes
,
...
...
official/nlp/modeling/networks/encoder_scaffold.py
View file @
1f8b5b27
...
@@ -74,9 +74,12 @@ class EncoderScaffold(tf.keras.Model):
...
@@ -74,9 +74,12 @@ class EncoderScaffold(tf.keras.Model):
standard pretraining.
standard pretraining.
num_hidden_instances: The number of times to instantiate and/or invoke the
num_hidden_instances: The number of times to instantiate and/or invoke the
hidden_cls.
hidden_cls.
hidden_cls: The class or instance to encode the input data. If `hidden_cls`
hidden_cls: Three types of input are supported: (1) class (2) instance
is not set, a KerasBERT transformer layer will be used as the encoder
(3) list of classes or instances, to encode the input data. If
class.
`hidden_cls` is not set, a KerasBERT transformer layer will be used as the
encoder class. If `hidden_cls` is a list of classes or instances, these
classes (instances) are sequentially instantiated (invoked) on top of
embedding layer. Mixing classes and instances in the list is allowed.
hidden_cfg: A dict of kwargs to pass to the hidden_cls, if it needs to be
hidden_cfg: A dict of kwargs to pass to the hidden_cls, if it needs to be
instantiated. If hidden_cls is not set, a config dict must be passed to
instantiated. If hidden_cls is not set, a config dict must be passed to
`hidden_cfg` with the following values:
`hidden_cfg` with the following values:
...
@@ -192,15 +195,26 @@ class EncoderScaffold(tf.keras.Model):
...
@@ -192,15 +195,26 @@ class EncoderScaffold(tf.keras.Model):
layer_output_data
=
[]
layer_output_data
=
[]
hidden_layers
=
[]
hidden_layers
=
[]
hidden_cfg
=
hidden_cfg
if
hidden_cfg
else
{}
hidden_cfg
=
hidden_cfg
if
hidden_cfg
else
{}
if
isinstance
(
hidden_cls
,
list
)
and
len
(
hidden_cls
)
!=
num_hidden_instances
:
raise
RuntimeError
(
(
'When input hidden_cls to EncoderScaffold %s is a list, it must '
'contain classes or instances with size specified by '
'num_hidden_instances, got %d vs %d.'
)
%
self
.
name
,
len
(
hidden_cls
),
num_hidden_instances
)
for
i
in
range
(
num_hidden_instances
):
for
i
in
range
(
num_hidden_instances
):
if
inspect
.
isclass
(
hidden_cls
):
if
isinstance
(
hidden_cls
,
list
):
cur_hidden_cls
=
hidden_cls
[
i
]
else
:
cur_hidden_cls
=
hidden_cls
if
inspect
.
isclass
(
cur_hidden_cls
):
if
hidden_cfg
and
'attention_cfg'
in
hidden_cfg
and
(
if
hidden_cfg
and
'attention_cfg'
in
hidden_cfg
and
(
layer_idx_as_attention_seed
):
layer_idx_as_attention_seed
):
hidden_cfg
=
copy
.
deepcopy
(
hidden_cfg
)
hidden_cfg
=
copy
.
deepcopy
(
hidden_cfg
)
hidden_cfg
[
'attention_cfg'
][
'seed'
]
=
i
hidden_cfg
[
'attention_cfg'
][
'seed'
]
=
i
layer
=
hidden_cls
(
**
hidden_cfg
)
layer
=
cur_
hidden_cls
(
**
hidden_cfg
)
else
:
else
:
layer
=
hidden_cls
layer
=
cur_
hidden_cls
data
=
layer
([
data
,
attention_mask
])
data
=
layer
([
data
,
attention_mask
])
layer_output_data
.
append
(
data
)
layer_output_data
.
append
(
data
)
hidden_layers
.
append
(
layer
)
hidden_layers
.
append
(
layer
)
...
@@ -347,6 +361,15 @@ class EncoderScaffold(tf.keras.Model):
...
@@ -347,6 +361,15 @@ class EncoderScaffold(tf.keras.Model):
else
:
else
:
return
self
.
_embedding_data
return
self
.
_embedding_data
@
property
def
embedding_network
(
self
):
if
self
.
_embedding_network
is
None
:
raise
RuntimeError
(
(
'The EncoderScaffold %s does not have a reference '
'to the embedding network. This is required when you '
'pass a custom embedding network to the scaffold.'
)
%
self
.
name
)
return
self
.
_embedding_network
@
property
@
property
def
hidden_layers
(
self
):
def
hidden_layers
(
self
):
"""List of hidden layers in the encoder."""
"""List of hidden layers in the encoder."""
...
...
official/nlp/modeling/networks/encoder_scaffold_test.py
View file @
1f8b5b27
...
@@ -605,6 +605,98 @@ class EncoderScaffoldHiddenInstanceTest(keras_parameterized.TestCase):
...
@@ -605,6 +605,98 @@ class EncoderScaffoldHiddenInstanceTest(keras_parameterized.TestCase):
self
.
assertNotEmpty
(
call_list
)
self
.
assertNotEmpty
(
call_list
)
self
.
assertTrue
(
call_list
[
0
],
"The passed layer class wasn't instantiated."
)
self
.
assertTrue
(
call_list
[
0
],
"The passed layer class wasn't instantiated."
)
def
test_hidden_cls_list
(
self
):
hidden_size
=
32
sequence_length
=
10
vocab_size
=
57
embedding_network
=
Embeddings
(
vocab_size
,
hidden_size
)
call_list
=
[]
hidden_cfg
=
{
"num_attention_heads"
:
2
,
"intermediate_size"
:
3072
,
"intermediate_activation"
:
activations
.
gelu
,
"dropout_rate"
:
0.1
,
"attention_dropout_rate"
:
0.1
,
"kernel_initializer"
:
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
"call_list"
:
call_list
}
mask_call_list
=
[]
mask_cfg
=
{
"call_list"
:
mask_call_list
}
# Create a small EncoderScaffold for testing. This time, we pass an already-
# instantiated layer object.
xformer
=
ValidatedTransformerLayer
(
**
hidden_cfg
)
xmask
=
ValidatedMaskLayer
(
**
mask_cfg
)
test_network_a
=
encoder_scaffold
.
EncoderScaffold
(
num_hidden_instances
=
3
,
pooled_output_dim
=
hidden_size
,
pooler_layer_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
hidden_cls
=
xformer
,
mask_cls
=
xmask
,
embedding_cls
=
embedding_network
)
# Create a network b with same embedding and hidden layers as network a.
test_network_b
=
encoder_scaffold
.
EncoderScaffold
(
num_hidden_instances
=
3
,
pooled_output_dim
=
hidden_size
,
pooler_layer_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
mask_cls
=
xmask
,
embedding_cls
=
test_network_a
.
embedding_network
,
hidden_cls
=
test_network_a
.
hidden_layers
)
# Create a network c with same embedding but fewer hidden layers compared to
# network a and b.
hidden_layers
=
test_network_a
.
hidden_layers
hidden_layers
.
pop
()
test_network_c
=
encoder_scaffold
.
EncoderScaffold
(
num_hidden_instances
=
2
,
pooled_output_dim
=
hidden_size
,
pooler_layer_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
mask_cls
=
xmask
,
embedding_cls
=
test_network_a
.
embedding_network
,
hidden_cls
=
hidden_layers
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
# Create model based off of network a:
data_a
,
pooled_a
=
test_network_a
([
word_ids
,
mask
])
model_a
=
tf
.
keras
.
Model
([
word_ids
,
mask
],
[
data_a
,
pooled_a
])
# Create model based off of network b:
data_b
,
pooled_b
=
test_network_b
([
word_ids
,
mask
])
model_b
=
tf
.
keras
.
Model
([
word_ids
,
mask
],
[
data_b
,
pooled_b
])
# Create model based off of network b:
data_c
,
pooled_c
=
test_network_c
([
word_ids
,
mask
])
model_c
=
tf
.
keras
.
Model
([
word_ids
,
mask
],
[
data_c
,
pooled_c
])
batch_size
=
3
word_id_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
))
output_a
,
_
=
model_a
.
predict
([
word_id_data
,
mask_data
])
output_b
,
_
=
model_b
.
predict
([
word_id_data
,
mask_data
])
output_c
,
_
=
model_c
.
predict
([
word_id_data
,
mask_data
])
# Outputs from model a and b should be the same since they share the same
# embedding and hidden layers.
self
.
assertAllEqual
(
output_a
,
output_b
)
# Outputs from model a and c shouldn't be the same since they share the same
# embedding layer but different number of hidden layers.
self
.
assertNotAllEqual
(
output_a
,
output_c
)
@
parameterized
.
parameters
(
True
,
False
)
@
parameterized
.
parameters
(
True
,
False
)
def
test_serialize_deserialize
(
self
,
use_hidden_cls_instance
):
def
test_serialize_deserialize
(
self
,
use_hidden_cls_instance
):
hidden_size
=
32
hidden_size
=
32
...
...
official/nlp/modeling/networks/funnel_transformer.py
0 → 100644
View file @
1f8b5b27
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Funnel Transformer network."""
# pylint: disable=g-classes-have-attributes
from
typing
import
Union
,
Collection
from
absl
import
logging
import
tensorflow
as
tf
from
official.nlp
import
keras_nlp
def
_pool_and_concat
(
data
,
unpool_length
:
int
,
stride
:
int
,
axes
:
Union
[
Collection
[
int
],
int
]):
"""Pools the data along a given axis with stride.
It also skips first unpool_length elements.
Args:
data: Tensor to be pooled.
unpool_length: Leading elements to be skipped.
stride: Stride for the given axis.
axes: Axes to pool the Tensor.
Returns:
Pooled and concatenated Tensor.
"""
# Wraps the axes as a list.
if
isinstance
(
axes
,
int
):
axes
=
[
axes
]
for
axis
in
axes
:
# Skips first `unpool_length` tokens.
unpool_tensor_shape
=
[
slice
(
None
)]
*
axis
+
[
slice
(
None
,
unpool_length
)]
unpool_tensor
=
data
[
unpool_tensor_shape
]
# Pools the second half.
pool_tensor_shape
=
[
slice
(
None
)]
*
axis
+
[
slice
(
unpool_length
,
None
,
stride
)
]
pool_tensor
=
data
[
pool_tensor_shape
]
data
=
tf
.
concat
((
unpool_tensor
,
pool_tensor
),
axis
=
axis
)
return
data
@
tf
.
keras
.
utils
.
register_keras_serializable
(
package
=
'Text'
)
class
FunnelTransformerEncoder
(
tf
.
keras
.
layers
.
Layer
):
"""Funnel Transformer-based encoder network.
Funnel Transformer Implementation of https://arxiv.org/abs/2006.03236.
This implementation utilizes the base framework with Bert
(https://arxiv.org/abs/1810.04805).
Its output is compatible with `BertEncoder`.
Args:
vocab_size: The size of the token vocabulary.
hidden_size: The size of the transformer hidden layers.
num_layers: The number of transformer layers.
num_attention_heads: The number of attention heads for each transformer. The
hidden size must be divisible by the number of attention heads.
max_sequence_length: The maximum sequence length that this encoder can
consume. If None, max_sequence_length uses the value from sequence length.
This determines the variable shape for positional embeddings.
type_vocab_size: The number of types that the 'type_ids' input can take.
inner_dim: The output dimension of the first Dense layer in a two-layer
feedforward network for each transformer.
inner_activation: The activation for the first Dense layer in a two-layer
feedforward network for each transformer.
output_dropout: Dropout probability for the post-attention and output
dropout.
attention_dropout: The dropout rate to use for the attention layers within
the transformer layers.
pool_stride: Pooling stride to compress the sequence length.
unpool_length: Leading n tokens to be skipped from pooling.
initializer: The initialzer to use for all weights in this encoder.
output_range: The sequence output range, [0, output_range), by slicing the
target sequence of the last transformer layer. `None` means the entire
target sequence will attend to the source sequence, which yields the full
output.
embedding_width: The width of the word embeddings. If the embedding width is
not equal to hidden size, embedding parameters will be factorized into two
matrices in the shape of ['vocab_size', 'embedding_width'] and
['embedding_width', 'hidden_size'] ('embedding_width' is usually much
smaller than 'hidden_size').
embedding_layer: An optional Layer instance which will be called to generate
embeddings for the input word IDs.
norm_first: Whether to normalize inputs to attention and intermediate dense
layers. If set False, output of attention and intermediate dense layers is
normalized.
"""
def
__init__
(
self
,
vocab_size
,
hidden_size
=
768
,
num_layers
=
12
,
num_attention_heads
=
12
,
max_sequence_length
=
512
,
type_vocab_size
=
16
,
inner_dim
=
3072
,
inner_activation
=
lambda
x
:
tf
.
keras
.
activations
.
gelu
(
x
,
approximate
=
True
),
output_dropout
=
0.1
,
attention_dropout
=
0.1
,
pool_stride
=
2
,
unpool_length
=
0
,
initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
output_range
=
None
,
embedding_width
=
None
,
embedding_layer
=
None
,
norm_first
=
False
,
**
kwargs
):
super
().
__init__
(
**
kwargs
)
activation
=
tf
.
keras
.
activations
.
get
(
inner_activation
)
initializer
=
tf
.
keras
.
initializers
.
get
(
initializer
)
if
embedding_width
is
None
:
embedding_width
=
hidden_size
if
embedding_layer
is
None
:
self
.
_embedding_layer
=
keras_nlp
.
layers
.
OnDeviceEmbedding
(
vocab_size
=
vocab_size
,
embedding_width
=
embedding_width
,
initializer
=
initializer
,
name
=
'word_embeddings'
)
else
:
self
.
_embedding_layer
=
embedding_layer
self
.
_position_embedding_layer
=
keras_nlp
.
layers
.
PositionEmbedding
(
initializer
=
initializer
,
max_length
=
max_sequence_length
,
name
=
'position_embedding'
)
self
.
_type_embedding_layer
=
keras_nlp
.
layers
.
OnDeviceEmbedding
(
vocab_size
=
type_vocab_size
,
embedding_width
=
embedding_width
,
initializer
=
initializer
,
use_one_hot
=
True
,
name
=
'type_embeddings'
)
self
.
_embedding_norm_layer
=
tf
.
keras
.
layers
.
LayerNormalization
(
name
=
'embeddings/layer_norm'
,
axis
=-
1
,
epsilon
=
1e-12
,
dtype
=
tf
.
float32
)
self
.
_embedding_dropout
=
tf
.
keras
.
layers
.
Dropout
(
rate
=
output_dropout
,
name
=
'embedding_dropout'
)
# We project the 'embedding' output to 'hidden_size' if it is not already
# 'hidden_size'.
self
.
_embedding_projection
=
None
if
embedding_width
!=
hidden_size
:
self
.
_embedding_projection
=
tf
.
keras
.
layers
.
experimental
.
EinsumDense
(
'...x,xy->...y'
,
output_shape
=
hidden_size
,
bias_axes
=
'y'
,
kernel_initializer
=
initializer
,
name
=
'embedding_projection'
)
self
.
_transformer_layers
=
[]
self
.
_attention_mask_layer
=
keras_nlp
.
layers
.
SelfAttentionMask
(
name
=
'self_attention_mask'
)
for
i
in
range
(
num_layers
):
layer
=
keras_nlp
.
layers
.
TransformerEncoderBlock
(
num_attention_heads
=
num_attention_heads
,
inner_dim
=
inner_dim
,
inner_activation
=
inner_activation
,
output_dropout
=
output_dropout
,
attention_dropout
=
attention_dropout
,
norm_first
=
norm_first
,
output_range
=
output_range
if
i
==
num_layers
-
1
else
None
,
kernel_initializer
=
initializer
,
name
=
'transformer/layer_%d'
%
i
)
self
.
_transformer_layers
.
append
(
layer
)
self
.
_pooler_layer
=
tf
.
keras
.
layers
.
Dense
(
units
=
hidden_size
,
activation
=
'tanh'
,
kernel_initializer
=
initializer
,
name
=
'pooler_transform'
)
self
.
_att_input_pool_layer
=
tf
.
keras
.
layers
.
MaxPooling1D
(
pool_size
=
pool_stride
,
strides
=
pool_stride
,
padding
=
'same'
,
name
=
'att_input_pool_layer'
)
self
.
_pool_stride
=
pool_stride
self
.
_unpool_length
=
unpool_length
self
.
_config
=
{
'vocab_size'
:
vocab_size
,
'hidden_size'
:
hidden_size
,
'num_layers'
:
num_layers
,
'num_attention_heads'
:
num_attention_heads
,
'max_sequence_length'
:
max_sequence_length
,
'type_vocab_size'
:
type_vocab_size
,
'inner_dim'
:
inner_dim
,
'inner_activation'
:
tf
.
keras
.
activations
.
serialize
(
activation
),
'output_dropout'
:
output_dropout
,
'attention_dropout'
:
attention_dropout
,
'initializer'
:
tf
.
keras
.
initializers
.
serialize
(
initializer
),
'output_range'
:
output_range
,
'embedding_width'
:
embedding_width
,
'embedding_layer'
:
embedding_layer
,
'norm_first'
:
norm_first
,
'pool_stride'
:
pool_stride
,
'unpool_length'
:
unpool_length
,
}
def
call
(
self
,
inputs
):
# inputs are [word_ids, mask, type_ids]
if
isinstance
(
inputs
,
(
list
,
tuple
)):
logging
.
warning
(
'List inputs to %s are discouraged.'
,
self
.
__class__
)
if
len
(
inputs
)
==
3
:
word_ids
,
mask
,
type_ids
=
inputs
else
:
raise
ValueError
(
'Unexpected inputs to %s with length at %d.'
%
(
self
.
__class__
,
len
(
inputs
)))
elif
isinstance
(
inputs
,
dict
):
word_ids
=
inputs
.
get
(
'input_word_ids'
)
mask
=
inputs
.
get
(
'input_mask'
)
type_ids
=
inputs
.
get
(
'input_type_ids'
)
else
:
raise
ValueError
(
'Unexpected inputs type to %s.'
%
self
.
__class__
)
word_embeddings
=
self
.
_embedding_layer
(
word_ids
)
# absolute position embeddings
position_embeddings
=
self
.
_position_embedding_layer
(
word_embeddings
)
type_embeddings
=
self
.
_type_embedding_layer
(
type_ids
)
embeddings
=
tf
.
keras
.
layers
.
add
(
[
word_embeddings
,
position_embeddings
,
type_embeddings
])
embeddings
=
self
.
_embedding_norm_layer
(
embeddings
)
embeddings
=
self
.
_embedding_dropout
(
embeddings
)
if
self
.
_embedding_projection
is
not
None
:
embeddings
=
self
.
_embedding_projection
(
embeddings
)
attention_mask
=
self
.
_attention_mask_layer
(
embeddings
,
mask
)
encoder_outputs
=
[]
x
=
embeddings
# TODO(b/195972228): attention_mask can be co-generated with pooling.
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
stride
=
self
.
_pool_stride
,
axes
=
[
1
])
for
layer
in
self
.
_transformer_layers
:
# Pools layer for compressing the query length.
pooled_inputs
=
self
.
_att_input_pool_layer
(
x
[:,
self
.
_unpool_length
:,
:])
query_inputs
=
tf
.
concat
(
values
=
(
tf
.
cast
(
x
[:,
:
self
.
_unpool_length
,
:],
dtype
=
pooled_inputs
.
dtype
),
pooled_inputs
),
axis
=
1
)
x
=
layer
([
query_inputs
,
x
,
attention_mask
])
# Pools the corresponding attention_mask.
attention_mask
=
_pool_and_concat
(
attention_mask
,
unpool_length
=
self
.
_unpool_length
,
stride
=
self
.
_pool_stride
,
axes
=
[
1
,
2
])
encoder_outputs
.
append
(
x
)
last_encoder_output
=
encoder_outputs
[
-
1
]
first_token_tensor
=
last_encoder_output
[:,
0
,
:]
pooled_output
=
self
.
_pooler_layer
(
first_token_tensor
)
return
dict
(
sequence_output
=
encoder_outputs
[
-
1
],
pooled_output
=
pooled_output
,
encoder_outputs
=
encoder_outputs
)
def
get_embedding_table
(
self
):
return
self
.
_embedding_layer
.
embeddings
def
get_embedding_layer
(
self
):
return
self
.
_embedding_layer
def
get_config
(
self
):
return
dict
(
self
.
_config
)
@
property
def
transformer_layers
(
self
):
"""List of Transformer layers in the encoder."""
return
self
.
_transformer_layers
@
property
def
pooler_layer
(
self
):
"""The pooler dense layer after the transformer layers."""
return
self
.
_pooler_layer
@
classmethod
def
from_config
(
cls
,
config
,
custom_objects
=
None
):
if
'embedding_layer'
in
config
and
config
[
'embedding_layer'
]
is
not
None
:
warn_string
=
(
'You are reloading a model that was saved with a '
'potentially-shared embedding layer object. If you contine to '
'train this model, the embedding layer will no longer be shared. '
'To work around this, load the model outside of the Keras API.'
)
print
(
'WARNING: '
+
warn_string
)
logging
.
warn
(
warn_string
)
return
cls
(
**
config
)
official/nlp/modeling/networks/funnel_transformer_test.py
0 → 100644
View file @
1f8b5b27
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for transformer-based bert encoder network."""
from
absl.testing
import
parameterized
import
numpy
as
np
import
tensorflow
as
tf
from
official.nlp.modeling.networks
import
funnel_transformer
class
SingleLayerModel
(
tf
.
keras
.
Model
):
def
__init__
(
self
,
layer
):
super
().
__init__
()
self
.
layer
=
layer
def
call
(
self
,
inputs
):
return
self
.
layer
(
inputs
)
class
FunnelTransformerEncoderTest
(
parameterized
.
TestCase
,
tf
.
test
.
TestCase
):
def
tearDown
(
self
):
super
(
FunnelTransformerEncoderTest
,
self
).
tearDown
()
tf
.
keras
.
mixed_precision
.
set_global_policy
(
"float32"
)
@
parameterized
.
named_parameters
((
"mix"
,
"mixed_float16"
,
tf
.
float16
),
(
"float32"
,
"float32"
,
tf
.
float32
))
def
test_network_creation
(
self
,
policy
,
pooled_dtype
):
tf
.
keras
.
mixed_precision
.
set_global_policy
(
policy
)
hidden_size
=
32
sequence_length
=
21
pool_stride
=
2
num_layers
=
3
# Create a small FunnelTransformerEncoder for testing.
test_network
=
funnel_transformer
.
FunnelTransformerEncoder
(
vocab_size
=
100
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
num_layers
,
pool_stride
=
pool_stride
,
unpool_length
=
0
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
self
.
assertIsInstance
(
test_network
.
transformer_layers
,
list
)
self
.
assertLen
(
test_network
.
transformer_layers
,
num_layers
)
self
.
assertIsInstance
(
test_network
.
pooler_layer
,
tf
.
keras
.
layers
.
Dense
)
# Stride=2 compresses sequence length to half the size at each layer.
# This configuration gives each layer of seq length: 21->11->6->3.
expected_data_shape
=
[
None
,
3
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_pooled_shape
,
pooled
.
shape
.
as_list
())
# The default output dtype is float32.
# If float_dtype is set to float16, the data output is float32 (from a layer
# norm) and pool output should be float16.
self
.
assertAllEqual
(
tf
.
float32
,
data
.
dtype
)
self
.
assertAllEqual
(
pooled_dtype
,
pooled
.
dtype
)
@
parameterized
.
named_parameters
(
(
"no_stride_no_unpool"
,
1
,
0
),
(
"large_stride_with_unpool"
,
3
,
1
),
(
"large_stride_with_large_unpool"
,
5
,
10
),
(
"no_stride_with_unpool"
,
1
,
1
),
)
def
test_all_encoder_outputs_network_creation
(
self
,
pool_stride
,
unpool_length
):
hidden_size
=
32
sequence_length
=
21
num_layers
=
3
# Create a small FunnelTransformerEncoder for testing.
test_network
=
funnel_transformer
.
FunnelTransformerEncoder
(
vocab_size
=
100
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
num_layers
,
pool_stride
=
pool_stride
,
unpool_length
=
unpool_length
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
all_encoder_outputs
=
dict_outputs
[
"encoder_outputs"
]
pooled
=
dict_outputs
[
"pooled_output"
]
expected_data_shape
=
[
None
,
sequence_length
,
hidden_size
]
expected_pooled_shape
=
[
None
,
hidden_size
]
self
.
assertLen
(
all_encoder_outputs
,
num_layers
)
for
data
in
all_encoder_outputs
:
expected_data_shape
[
1
]
=
unpool_length
+
(
expected_data_shape
[
1
]
+
pool_stride
-
1
-
unpool_length
)
//
pool_stride
print
(
"shapes:"
,
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_data_shape
,
data
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_pooled_shape
,
pooled
.
shape
.
as_list
())
# The default output dtype is float32.
self
.
assertAllEqual
(
tf
.
float32
,
all_encoder_outputs
[
-
1
].
dtype
)
self
.
assertAllEqual
(
tf
.
float32
,
pooled
.
dtype
)
@
parameterized
.
named_parameters
(
(
"all_sequence"
,
None
,
3
,
0
),
(
"output_range"
,
1
,
1
,
0
),
(
"all_sequence_wit_unpool"
,
None
,
4
,
1
),
(
"output_range_with_unpool"
,
1
,
1
,
1
),
(
"output_range_with_large_unpool"
,
1
,
1
,
2
),
)
def
test_network_invocation
(
self
,
output_range
,
out_seq_len
,
unpool_length
):
hidden_size
=
32
sequence_length
=
21
vocab_size
=
57
num_types
=
7
pool_stride
=
2
# Create a small FunnelTransformerEncoder for testing.
test_network
=
funnel_transformer
.
FunnelTransformerEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
,
output_range
=
output_range
,
pool_stride
=
pool_stride
,
unpool_length
=
unpool_length
)
# Create the inputs (note that the first dimension is implicit).
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
# Create a model based off of this network:
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
# Invoke the model. We can't validate the output data here (the model is too
# complex) but this will catch structural runtime errors.
batch_size
=
3
word_id_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
))
type_id_data
=
np
.
random
.
randint
(
num_types
,
size
=
(
batch_size
,
sequence_length
))
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
1
],
out_seq_len
)
# output_range
# Creates a FunnelTransformerEncoder with max_sequence_length !=
# sequence_length
max_sequence_length
=
128
test_network
=
funnel_transformer
.
FunnelTransformerEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
max_sequence_length
=
max_sequence_length
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
,
pool_stride
=
pool_stride
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
1
],
3
)
# Creates a FunnelTransformerEncoder with embedding_width != hidden_size
test_network
=
funnel_transformer
.
FunnelTransformerEncoder
(
vocab_size
=
vocab_size
,
hidden_size
=
hidden_size
,
max_sequence_length
=
max_sequence_length
,
num_attention_heads
=
2
,
num_layers
=
3
,
type_vocab_size
=
num_types
,
embedding_width
=
16
,
pool_stride
=
pool_stride
)
dict_outputs
=
test_network
([
word_ids
,
mask
,
type_ids
])
data
=
dict_outputs
[
"sequence_output"
]
pooled
=
dict_outputs
[
"pooled_output"
]
model
=
tf
.
keras
.
Model
([
word_ids
,
mask
,
type_ids
],
[
data
,
pooled
])
outputs
=
model
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
self
.
assertEqual
(
outputs
[
0
].
shape
[
-
1
],
hidden_size
)
self
.
assertTrue
(
hasattr
(
test_network
,
"_embedding_projection"
))
def
test_serialize_deserialize
(
self
):
# Create a network object that sets all of its config options.
kwargs
=
dict
(
vocab_size
=
100
,
hidden_size
=
32
,
num_layers
=
3
,
num_attention_heads
=
2
,
max_sequence_length
=
21
,
type_vocab_size
=
12
,
inner_dim
=
1223
,
inner_activation
=
"relu"
,
output_dropout
=
0.05
,
attention_dropout
=
0.22
,
initializer
=
"glorot_uniform"
,
output_range
=-
1
,
embedding_width
=
16
,
embedding_layer
=
None
,
norm_first
=
False
,
pool_stride
=
2
,
unpool_length
=
0
)
network
=
funnel_transformer
.
FunnelTransformerEncoder
(
**
kwargs
)
expected_config
=
dict
(
kwargs
)
expected_config
[
"inner_activation"
]
=
tf
.
keras
.
activations
.
serialize
(
tf
.
keras
.
activations
.
get
(
expected_config
[
"inner_activation"
]))
expected_config
[
"initializer"
]
=
tf
.
keras
.
initializers
.
serialize
(
tf
.
keras
.
initializers
.
get
(
expected_config
[
"initializer"
]))
self
.
assertEqual
(
network
.
get_config
(),
expected_config
)
# Create another network object from the first object's config.
new_network
=
funnel_transformer
.
FunnelTransformerEncoder
.
from_config
(
network
.
get_config
())
# If the serialization was successful, the new config should match the old.
self
.
assertAllEqual
(
network
.
get_config
(),
new_network
.
get_config
())
# Tests model saving/loading.
model_path
=
self
.
get_temp_dir
()
+
"/model"
network_wrapper
=
SingleLayerModel
(
network
)
# One forward-path to ensure input_shape.
batch_size
=
3
sequence_length
=
21
vocab_size
=
100
num_types
=
12
word_id_data
=
np
.
random
.
randint
(
vocab_size
,
size
=
(
batch_size
,
sequence_length
))
mask_data
=
np
.
random
.
randint
(
2
,
size
=
(
batch_size
,
sequence_length
))
type_id_data
=
np
.
random
.
randint
(
num_types
,
size
=
(
batch_size
,
sequence_length
))
_
=
network_wrapper
.
predict
([
word_id_data
,
mask_data
,
type_id_data
])
network_wrapper
.
save
(
model_path
)
_
=
tf
.
keras
.
models
.
load_model
(
model_path
)
if
__name__
==
"__main__"
:
tf
.
test
.
main
()
official/nlp/projects/teams/teams.py
0 → 100644
View file @
1f8b5b27
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TEAMS model configurations and instantiation methods."""
import
dataclasses
import
gin
import
tensorflow
as
tf
from
official.modeling
import
tf_utils
from
official.modeling.hyperparams
import
base_config
from
official.nlp.configs
import
encoders
from
official.nlp.modeling
import
layers
from
official.nlp.modeling
import
networks
@
dataclasses
.
dataclass
class
TeamsPretrainerConfig
(
base_config
.
Config
):
"""Teams pretrainer configuration."""
# Candidate size for multi-word selection task, including the correct word.
candidate_size
:
int
=
5
# Weight for the generator masked language model task.
generator_loss_weight
:
float
=
1.0
# Weight for the replaced token detection task.
discriminator_rtd_loss_weight
:
float
=
5.0
# Weight for the multi-word selection task.
discriminator_mws_loss_weight
:
float
=
2.0
# Whether share embedding network between generator and discriminator.
tie_embeddings
:
bool
=
True
# Number of bottom layers shared between generator and discriminator.
# Non-positive value implies no sharing.
num_shared_generator_hidden_layers
:
int
=
3
# Number of bottom layers shared between different discriminator tasks.
num_discriminator_task_agnostic_layers
:
int
=
11
generator
:
encoders
.
BertEncoderConfig
=
encoders
.
BertEncoderConfig
()
discriminator
:
encoders
.
BertEncoderConfig
=
encoders
.
BertEncoderConfig
()
# Used for compatibility with continuous finetuning where common BERT config
# is used.
encoder
:
encoders
.
EncoderConfig
=
encoders
.
EncoderConfig
()
@
gin
.
configurable
def
get_encoder
(
bert_config
,
embedding_network
=
None
,
hidden_layers
=
layers
.
Transformer
):
"""Gets a 'EncoderScaffold' object.
Args:
bert_config: A 'modeling.BertConfig'.
embedding_network: Embedding network instance.
hidden_layers: List of hidden layer instances.
Returns:
A encoder object.
"""
# embedding_size is required for PackedSequenceEmbedding.
if
bert_config
.
embedding_size
is
None
:
bert_config
.
embedding_size
=
bert_config
.
hidden_size
embedding_cfg
=
dict
(
vocab_size
=
bert_config
.
vocab_size
,
type_vocab_size
=
bert_config
.
type_vocab_size
,
hidden_size
=
bert_config
.
hidden_size
,
embedding_width
=
bert_config
.
embedding_size
,
max_seq_length
=
bert_config
.
max_position_embeddings
,
initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
bert_config
.
initializer_range
),
dropout_rate
=
bert_config
.
dropout_rate
,
)
hidden_cfg
=
dict
(
num_attention_heads
=
bert_config
.
num_attention_heads
,
intermediate_size
=
bert_config
.
intermediate_size
,
intermediate_activation
=
tf_utils
.
get_activation
(
bert_config
.
hidden_activation
),
dropout_rate
=
bert_config
.
dropout_rate
,
attention_dropout_rate
=
bert_config
.
attention_dropout_rate
,
kernel_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
bert_config
.
initializer_range
),
)
if
embedding_network
is
None
:
embedding_network
=
networks
.
PackedSequenceEmbedding
(
**
embedding_cfg
)
kwargs
=
dict
(
embedding_cfg
=
embedding_cfg
,
embedding_cls
=
embedding_network
,
hidden_cls
=
hidden_layers
,
hidden_cfg
=
hidden_cfg
,
num_hidden_instances
=
bert_config
.
num_layers
,
pooled_output_dim
=
bert_config
.
hidden_size
,
pooler_layer_initializer
=
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
bert_config
.
initializer_range
),
dict_outputs
=
True
)
# Relies on gin configuration to define the Transformer encoder arguments.
return
networks
.
encoder_scaffold
.
EncoderScaffold
(
**
kwargs
)
official/nlp/projects/teams/teams_pretrainer.py
0 → 100644
View file @
1f8b5b27
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Trainer network for ELECTRA models."""
# pylint: disable=g-classes-have-attributes
import
tensorflow
as
tf
from
official.modeling
import
tf_utils
from
official.nlp.modeling
import
layers
from
official.nlp.modeling
import
models
class
ReplacedTokenDetectionHead
(
tf
.
keras
.
layers
.
Layer
):
"""Replaced token detection discriminator head.
Arguments:
encoder_cfg: Encoder config, used to create hidden layers and head.
num_task_agnostic_layers: Number of task agnostic layers in the
discriminator.
output: The output style for this network. Can be either 'logits' or
'predictions'.
"""
def
__init__
(
self
,
encoder_cfg
,
num_task_agnostic_layers
,
output
=
'logits'
,
name
=
'rtd'
,
**
kwargs
):
super
(
ReplacedTokenDetectionHead
,
self
).
__init__
(
name
=
name
,
**
kwargs
)
self
.
num_task_agnostic_layers
=
num_task_agnostic_layers
self
.
hidden_size
=
encoder_cfg
[
'embedding_cfg'
][
'hidden_size'
]
self
.
num_hidden_instances
=
encoder_cfg
[
'num_hidden_instances'
]
self
.
hidden_cfg
=
encoder_cfg
[
'hidden_cfg'
]
self
.
activation
=
self
.
hidden_cfg
[
'intermediate_activation'
]
self
.
initializer
=
self
.
hidden_cfg
[
'kernel_initializer'
]
if
output
not
in
(
'predictions'
,
'logits'
):
raise
ValueError
(
(
'Unknown `output` value "%s". `output` can be either "logits" or '
'"predictions"'
)
%
output
)
self
.
_output_type
=
output
def
build
(
self
,
input_shape
):
self
.
hidden_layers
=
[]
for
i
in
range
(
self
.
num_task_agnostic_layers
,
self
.
num_hidden_instances
):
self
.
hidden_layers
.
append
(
layers
.
Transformer
(
num_attention_heads
=
self
.
hidden_cfg
[
'num_attention_heads'
],
intermediate_size
=
self
.
hidden_cfg
[
'intermediate_size'
],
intermediate_activation
=
self
.
activation
,
dropout_rate
=
self
.
hidden_cfg
[
'dropout_rate'
],
attention_dropout_rate
=
self
.
hidden_cfg
[
'attention_dropout_rate'
],
kernel_initializer
=
self
.
initializer
,
name
=
'transformer/layer_%d_rtd'
%
i
))
self
.
dense
=
tf
.
keras
.
layers
.
Dense
(
self
.
hidden_size
,
activation
=
self
.
activation
,
kernel_initializer
=
self
.
initializer
,
name
=
'transform/rtd_dense'
)
self
.
rtd_head
=
tf
.
keras
.
layers
.
Dense
(
units
=
1
,
kernel_initializer
=
self
.
initializer
,
name
=
'transform/rtd_head'
)
def
call
(
self
,
sequence_data
,
input_mask
):
"""Compute inner-products of hidden vectors with sampled element embeddings.
Args:
sequence_data: A [batch_size, seq_length, num_hidden] tensor.
input_mask: A [batch_size, seq_length] binary mask to separate the input
from the padding.
Returns:
A [batch_size, seq_length] tensor.
"""
attention_mask
=
layers
.
SelfAttentionMask
()([
sequence_data
,
input_mask
])
data
=
sequence_data
for
hidden_layer
in
self
.
hidden_layers
:
data
=
hidden_layer
([
sequence_data
,
attention_mask
])
rtd_logits
=
self
.
rtd_head
(
self
.
dense
(
data
))
return
tf
.
squeeze
(
rtd_logits
,
axis
=-
1
)
class
MultiWordSelectionHead
(
tf
.
keras
.
layers
.
Layer
):
"""Multi-word selection discriminator head.
Arguments:
embedding_table: The embedding table.
activation: The activation, if any, for the dense layer.
initializer: The intializer for the dense layer. Defaults to a Glorot
uniform initializer.
output: The output style for this network. Can be either 'logits' or
'predictions'.
"""
def
__init__
(
self
,
embedding_table
,
activation
=
None
,
initializer
=
'glorot_uniform'
,
output
=
'logits'
,
name
=
'mws'
,
**
kwargs
):
super
(
MultiWordSelectionHead
,
self
).
__init__
(
name
=
name
,
**
kwargs
)
self
.
embedding_table
=
embedding_table
self
.
activation
=
activation
self
.
initializer
=
tf
.
keras
.
initializers
.
get
(
initializer
)
if
output
not
in
(
'predictions'
,
'logits'
):
raise
ValueError
(
(
'Unknown `output` value "%s". `output` can be either "logits" or '
'"predictions"'
)
%
output
)
self
.
_output_type
=
output
def
build
(
self
,
input_shape
):
self
.
_vocab_size
,
self
.
embed_size
=
self
.
embedding_table
.
shape
self
.
dense
=
tf
.
keras
.
layers
.
Dense
(
self
.
embed_size
,
activation
=
self
.
activation
,
kernel_initializer
=
self
.
initializer
,
name
=
'transform/mws_dense'
)
self
.
layer_norm
=
tf
.
keras
.
layers
.
LayerNormalization
(
axis
=-
1
,
epsilon
=
1e-12
,
name
=
'transform/mws_layernorm'
)
super
(
MultiWordSelectionHead
,
self
).
build
(
input_shape
)
def
call
(
self
,
sequence_data
,
masked_positions
,
candidate_sets
):
"""Compute inner-products of hidden vectors with sampled element embeddings.
Args:
sequence_data: A [batch_size, seq_length, num_hidden] tensor.
masked_positions: A [batch_size, num_prediction] tensor.
candidate_sets: A [batch_size, num_prediction, k] tensor.
Returns:
A [batch_size, num_prediction, k] tensor.
"""
# Gets shapes for later usage
candidate_set_shape
=
tf_utils
.
get_shape_list
(
candidate_sets
)
num_prediction
=
candidate_set_shape
[
1
]
# Gathers hidden vectors -> (batch_size, num_prediction, 1, embed_size)
masked_lm_input
=
self
.
_gather_indexes
(
sequence_data
,
masked_positions
)
lm_data
=
self
.
dense
(
masked_lm_input
)
lm_data
=
self
.
layer_norm
(
lm_data
)
lm_data
=
tf
.
expand_dims
(
tf
.
reshape
(
lm_data
,
[
-
1
,
num_prediction
,
self
.
embed_size
]),
2
)
# Gathers embeddings -> (batch_size, num_prediction, embed_size, k)
flat_candidate_sets
=
tf
.
reshape
(
candidate_sets
,
[
-
1
])
candidate_embeddings
=
tf
.
gather
(
self
.
embedding_table
,
flat_candidate_sets
)
candidate_embeddings
=
tf
.
reshape
(
candidate_embeddings
,
tf
.
concat
([
tf
.
shape
(
candidate_sets
),
[
self
.
embed_size
]],
axis
=
0
)
)
candidate_embeddings
.
set_shape
(
candidate_sets
.
shape
.
as_list
()
+
[
self
.
embed_size
])
candidate_embeddings
=
tf
.
transpose
(
candidate_embeddings
,
[
0
,
1
,
3
,
2
])
# matrix multiplication + squeeze -> (batch_size, num_prediction, k)
logits
=
tf
.
matmul
(
lm_data
,
candidate_embeddings
)
logits
=
tf
.
squeeze
(
logits
,
2
)
if
self
.
_output_type
==
'logits'
:
return
logits
return
tf
.
nn
.
log_softmax
(
logits
)
def
_gather_indexes
(
self
,
sequence_tensor
,
positions
):
"""Gathers the vectors at the specific positions.
Args:
sequence_tensor: Sequence output of shape
(`batch_size`, `seq_length`, `num_hidden`) where `num_hidden` is
number of hidden units.
positions: Positions ids of tokens in batched sequences.
Returns:
Sequence tensor of shape (batch_size * num_predictions,
num_hidden).
"""
sequence_shape
=
tf_utils
.
get_shape_list
(
sequence_tensor
,
name
=
'sequence_output_tensor'
)
batch_size
,
seq_length
,
width
=
sequence_shape
flat_offsets
=
tf
.
reshape
(
tf
.
range
(
0
,
batch_size
,
dtype
=
tf
.
int32
)
*
seq_length
,
[
-
1
,
1
])
flat_positions
=
tf
.
reshape
(
positions
+
flat_offsets
,
[
-
1
])
flat_sequence_tensor
=
tf
.
reshape
(
sequence_tensor
,
[
batch_size
*
seq_length
,
width
])
output_tensor
=
tf
.
gather
(
flat_sequence_tensor
,
flat_positions
)
return
output_tensor
@
tf
.
keras
.
utils
.
register_keras_serializable
(
package
=
'Text'
)
class
TeamsPretrainer
(
tf
.
keras
.
Model
):
"""TEAMS network training model.
This is an implementation of the network structure described in "Training
ELECTRA Augmented with Multi-word Selection"
(https://arxiv.org/abs/2106.00139).
The TeamsPretrainer allows a user to pass in two transformer encoders, one
for generator, the other for discriminator (multi-word selection). The
pretrainer then instantiates the masked language model (at generator side) and
classification networks (including both multi-word selection head and replaced
token detection head) that are used to create the training objectives.
*Note* that the model is constructed by Keras Subclass API, where layers are
defined inside `__init__` and `call()` implements the computation.
Args:
generator_network: A transformer encoder for generator, this network should
output a sequence output.
discriminator_mws_network: A transformer encoder for multi-word selection
discriminator, this network should output a sequence output.
num_discriminator_task_agnostic_layers: Number of layers shared between
multi-word selection and random token detection discriminators.
vocab_size: Size of generator output vocabulary
candidate_size: Candidate size for multi-word selection task,
including the correct word.
mlm_activation: The activation (if any) to use in the masked LM and
classification networks. If None, no activation will be used.
mlm_initializer: The initializer (if any) to use in the masked LM and
classification networks. Defaults to a Glorot uniform initializer.
output_type: The output style for this network. Can be either `logits` or
`predictions`.
"""
def
__init__
(
self
,
generator_network
,
discriminator_mws_network
,
num_discriminator_task_agnostic_layers
,
vocab_size
,
candidate_size
=
5
,
mlm_activation
=
None
,
mlm_initializer
=
'glorot_uniform'
,
output_type
=
'logits'
,
**
kwargs
):
super
().
__init__
()
self
.
_config
=
{
'generator_network'
:
generator_network
,
'discriminator_mws_network'
:
discriminator_mws_network
,
'num_discriminator_task_agnostic_layers'
:
num_discriminator_task_agnostic_layers
,
'vocab_size'
:
vocab_size
,
'candidate_size'
:
candidate_size
,
'mlm_activation'
:
mlm_activation
,
'mlm_initializer'
:
mlm_initializer
,
'output_type'
:
output_type
,
}
for
k
,
v
in
kwargs
.
items
():
self
.
_config
[
k
]
=
v
self
.
generator_network
=
generator_network
self
.
discriminator_mws_network
=
discriminator_mws_network
self
.
vocab_size
=
vocab_size
self
.
candidate_size
=
candidate_size
self
.
mlm_activation
=
mlm_activation
self
.
mlm_initializer
=
mlm_initializer
self
.
output_type
=
output_type
embedding_table
=
generator_network
.
embedding_network
.
get_embedding_table
()
self
.
masked_lm
=
layers
.
MaskedLM
(
embedding_table
=
embedding_table
,
activation
=
mlm_activation
,
initializer
=
mlm_initializer
,
output
=
output_type
,
name
=
'generator_masked_lm'
)
discriminator_cfg
=
self
.
discriminator_mws_network
.
get_config
()
self
.
discriminator_rtd_head
=
ReplacedTokenDetectionHead
(
encoder_cfg
=
discriminator_cfg
,
num_task_agnostic_layers
=
num_discriminator_task_agnostic_layers
,
output
=
output_type
,
name
=
'discriminator_rtd'
)
hidden_cfg
=
discriminator_cfg
[
'hidden_cfg'
]
self
.
discriminator_mws_head
=
MultiWordSelectionHead
(
embedding_table
=
embedding_table
,
activation
=
hidden_cfg
[
'intermediate_activation'
],
initializer
=
hidden_cfg
[
'kernel_initializer'
],
output
=
output_type
,
name
=
'discriminator_mws'
)
self
.
num_task_agnostic_layers
=
num_discriminator_task_agnostic_layers
def
call
(
self
,
inputs
):
"""TEAMS forward pass.
Args:
inputs: A dict of all inputs, same as the standard BERT model.
Returns:
outputs: A dict of pretrainer model outputs, including
(1) lm_outputs: A `[batch_size, num_token_predictions, vocab_size]`
tensor indicating logits on masked positions.
(2) disc_rtd_logits: A `[batch_size, sequence_length]` tensor indicating
logits for discriminator replaced token detection task.
(3) disc_rtd_label: A `[batch_size, sequence_length]` tensor indicating
target labels for discriminator replaced token detection task.
(4) disc_mws_logits: A `[batch_size, num_token_predictions,
candidate_size]` tensor indicating logits for discriminator multi-word
selection task.
(5) disc_mws_labels: A `[batch_size, num_token_predictions]` tensor
indicating target labels for discriminator multi-word selection task.
"""
input_word_ids
=
inputs
[
'input_word_ids'
]
input_mask
=
inputs
[
'input_mask'
]
input_type_ids
=
inputs
[
'input_type_ids'
]
masked_lm_positions
=
inputs
[
'masked_lm_positions'
]
# Runs generator.
sequence_output
=
self
.
generator_network
(
[
input_word_ids
,
input_mask
,
input_type_ids
])[
'sequence_output'
]
lm_outputs
=
self
.
masked_lm
(
sequence_output
,
masked_lm_positions
)
# Samples tokens from generator.
fake_data
=
self
.
_get_fake_data
(
inputs
,
lm_outputs
)
# Runs discriminator.
disc_input
=
fake_data
[
'inputs'
]
disc_rtd_label
=
fake_data
[
'is_fake_tokens'
]
disc_mws_candidates
=
fake_data
[
'candidate_set'
]
mws_sequence_outputs
=
self
.
discriminator_mws_network
([
disc_input
[
'input_word_ids'
],
disc_input
[
'input_mask'
],
disc_input
[
'input_type_ids'
]
])[
'encoder_outputs'
]
# Applies replaced token detection with input selected based on
# self.num_discriminator_task_agnostic_layers
disc_rtd_logits
=
self
.
discriminator_rtd_head
(
mws_sequence_outputs
[
self
.
num_task_agnostic_layers
-
1
],
input_mask
)
# Applies multi-word selection.
disc_mws_logits
=
self
.
discriminator_mws_head
(
mws_sequence_outputs
[
-
1
],
masked_lm_positions
,
disc_mws_candidates
)
disc_mws_label
=
tf
.
zeros_like
(
masked_lm_positions
,
dtype
=
tf
.
int32
)
outputs
=
{
'lm_outputs'
:
lm_outputs
,
'disc_rtd_logits'
:
disc_rtd_logits
,
'disc_rtd_label'
:
disc_rtd_label
,
'disc_mws_logits'
:
disc_mws_logits
,
'disc_mws_label'
:
disc_mws_label
,
}
return
outputs
def
_get_fake_data
(
self
,
inputs
,
mlm_logits
):
"""Generate corrupted data for discriminator.
Note it is poosible for sampled token to be the same as the correct one.
Args:
inputs: A dict of all inputs, same as the input of `call()` function
mlm_logits: The generator's output logits
Returns:
A dict of generated fake data
"""
inputs
=
models
.
electra_pretrainer
.
unmask
(
inputs
,
duplicate
=
True
)
# Samples replaced token.
sampled_tokens
=
tf
.
stop_gradient
(
models
.
electra_pretrainer
.
sample_from_softmax
(
mlm_logits
,
disallow
=
None
))
sampled_tokids
=
tf
.
argmax
(
sampled_tokens
,
-
1
,
output_type
=
tf
.
int32
)
# Prepares input and label for replaced token detection task.
updated_input_ids
,
masked
=
models
.
electra_pretrainer
.
scatter_update
(
inputs
[
'input_word_ids'
],
sampled_tokids
,
inputs
[
'masked_lm_positions'
])
rtd_labels
=
masked
*
(
1
-
tf
.
cast
(
tf
.
equal
(
updated_input_ids
,
inputs
[
'input_word_ids'
]),
tf
.
int32
))
updated_inputs
=
models
.
electra_pretrainer
.
get_updated_inputs
(
inputs
,
duplicate
=
True
,
input_word_ids
=
updated_input_ids
)
# Samples (candidate_size-1) negatives and concat with true tokens
disallow
=
tf
.
one_hot
(
inputs
[
'masked_lm_ids'
],
depth
=
self
.
vocab_size
,
dtype
=
tf
.
float32
)
sampled_candidates
=
tf
.
stop_gradient
(
sample_k_from_softmax
(
mlm_logits
,
k
=
self
.
candidate_size
-
1
,
disallow
=
disallow
))
true_token_id
=
tf
.
expand_dims
(
inputs
[
'masked_lm_ids'
],
-
1
)
candidate_set
=
tf
.
concat
([
true_token_id
,
sampled_candidates
],
-
1
)
return
{
'inputs'
:
updated_inputs
,
'is_fake_tokens'
:
rtd_labels
,
'sampled_tokens'
:
sampled_tokens
,
'candidate_set'
:
candidate_set
}
@
property
def
checkpoint_items
(
self
):
"""Returns a dictionary of items to be additionally checkpointed."""
items
=
dict
(
encoder
=
self
.
discriminator_mws_network
)
return
items
def
get_config
(
self
):
return
self
.
_config
@
classmethod
def
from_config
(
cls
,
config
,
custom_objects
=
None
):
return
cls
(
**
config
)
def
sample_k_from_softmax
(
logits
,
k
,
disallow
=
None
,
use_topk
=
False
):
"""Implement softmax sampling using gumbel softmax trick to select k items.
Args:
logits: A [batch_size, num_token_predictions, vocab_size] tensor indicating
the generator output logits for each masked position.
k: Number of samples
disallow: If `None`, we directly sample tokens from the logits. Otherwise,
this is a tensor of size [batch_size, num_token_predictions, vocab_size]
indicating the true word id in each masked position.
use_topk: Whether to use tf.nn.top_k or using iterative approach where the
latter is empirically faster.
Returns:
sampled_tokens: A [batch_size, num_token_predictions, k] tensor indicating
the sampled word id in each masked position.
"""
if
use_topk
:
if
disallow
is
not
None
:
logits
-=
10000.0
*
disallow
uniform_noise
=
tf
.
random
.
uniform
(
tf_utils
.
get_shape_list
(
logits
),
minval
=
0
,
maxval
=
1
)
gumbel_noise
=
-
tf
.
math
.
log
(
-
tf
.
math
.
log
(
uniform_noise
+
1e-9
)
+
1e-9
)
_
,
sampled_tokens
=
tf
.
nn
.
top_k
(
logits
+
gumbel_noise
,
k
=
k
,
sorted
=
False
)
else
:
sampled_tokens_list
=
[]
vocab_size
=
tf_utils
.
get_shape_list
(
logits
)[
-
1
]
if
disallow
is
not
None
:
logits
-=
10000.0
*
disallow
uniform_noise
=
tf
.
random
.
uniform
(
tf_utils
.
get_shape_list
(
logits
),
minval
=
0
,
maxval
=
1
)
gumbel_noise
=
-
tf
.
math
.
log
(
-
tf
.
math
.
log
(
uniform_noise
+
1e-9
)
+
1e-9
)
logits
+=
gumbel_noise
for
_
in
range
(
k
):
token_ids
=
tf
.
argmax
(
logits
,
-
1
,
output_type
=
tf
.
int32
)
sampled_tokens_list
.
append
(
token_ids
)
logits
-=
10000.0
*
tf
.
one_hot
(
token_ids
,
depth
=
vocab_size
,
dtype
=
tf
.
float32
)
sampled_tokens
=
tf
.
stack
(
sampled_tokens_list
,
-
1
)
return
sampled_tokens
official/nlp/projects/teams/teams_pretrainer_test.py
0 → 100644
View file @
1f8b5b27
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for TEAMS pre trainer network."""
import
tensorflow
as
tf
from
tensorflow.python.keras
import
keras_parameterized
# pylint: disable=g-direct-tensorflow-import
from
official.modeling
import
activations
from
official.nlp.modeling.networks
import
encoder_scaffold
from
official.nlp.modeling.networks
import
packed_sequence_embedding
from
official.nlp.projects.teams
import
teams_pretrainer
# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It
# guarantees forward compatibility of this code for the V2 switchover.
@
keras_parameterized
.
run_all_keras_modes
class
TeamsPretrainerTest
(
keras_parameterized
.
TestCase
):
# Build a transformer network to use within the TEAMS trainer.
def
_get_network
(
self
,
vocab_size
):
sequence_length
=
512
hidden_size
=
50
embedding_cfg
=
{
'vocab_size'
:
vocab_size
,
'type_vocab_size'
:
1
,
'hidden_size'
:
hidden_size
,
'embedding_width'
:
hidden_size
,
'max_seq_length'
:
sequence_length
,
'initializer'
:
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
'dropout_rate'
:
0.1
,
}
embedding_inst
=
packed_sequence_embedding
.
PackedSequenceEmbedding
(
**
embedding_cfg
)
hidden_cfg
=
{
'num_attention_heads'
:
2
,
'intermediate_size'
:
3072
,
'intermediate_activation'
:
activations
.
gelu
,
'dropout_rate'
:
0.1
,
'attention_dropout_rate'
:
0.1
,
'kernel_initializer'
:
tf
.
keras
.
initializers
.
TruncatedNormal
(
stddev
=
0.02
),
}
return
encoder_scaffold
.
EncoderScaffold
(
num_hidden_instances
=
2
,
pooled_output_dim
=
hidden_size
,
embedding_cfg
=
embedding_cfg
,
embedding_cls
=
embedding_inst
,
hidden_cfg
=
hidden_cfg
,
dict_outputs
=
True
)
def
test_teams_pretrainer
(
self
):
"""Validate that the Keras object can be created."""
vocab_size
=
100
test_generator_network
=
self
.
_get_network
(
vocab_size
)
test_discriminator_network
=
self
.
_get_network
(
vocab_size
)
# Create a TEAMS trainer with the created network.
candidate_size
=
3
teams_trainer_model
=
teams_pretrainer
.
TeamsPretrainer
(
generator_network
=
test_generator_network
,
discriminator_mws_network
=
test_discriminator_network
,
num_discriminator_task_agnostic_layers
=
1
,
vocab_size
=
vocab_size
,
candidate_size
=
candidate_size
)
# Create a set of 2-dimensional inputs (the first dimension is implicit).
num_token_predictions
=
2
sequence_length
=
128
word_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
mask
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
type_ids
=
tf
.
keras
.
Input
(
shape
=
(
sequence_length
,),
dtype
=
tf
.
int32
)
lm_positions
=
tf
.
keras
.
Input
(
shape
=
(
num_token_predictions
,),
dtype
=
tf
.
int32
)
lm_ids
=
tf
.
keras
.
Input
(
shape
=
(
num_token_predictions
,),
dtype
=
tf
.
int32
)
inputs
=
{
'input_word_ids'
:
word_ids
,
'input_mask'
:
mask
,
'input_type_ids'
:
type_ids
,
'masked_lm_positions'
:
lm_positions
,
'masked_lm_ids'
:
lm_ids
}
# Invoke the trainer model on the inputs. This causes the layer to be built.
outputs
=
teams_trainer_model
(
inputs
)
lm_outs
=
outputs
[
'lm_outputs'
]
disc_rtd_logits
=
outputs
[
'disc_rtd_logits'
]
disc_rtd_label
=
outputs
[
'disc_rtd_label'
]
disc_mws_logits
=
outputs
[
'disc_mws_logits'
]
disc_mws_label
=
outputs
[
'disc_mws_label'
]
# Validate that the outputs are of the expected shape.
expected_lm_shape
=
[
None
,
num_token_predictions
,
vocab_size
]
expected_disc_rtd_logits_shape
=
[
None
,
sequence_length
]
expected_disc_rtd_label_shape
=
[
None
,
sequence_length
]
expected_disc_disc_mws_logits_shape
=
[
None
,
num_token_predictions
,
candidate_size
]
expected_disc_disc_mws_label_shape
=
[
None
,
num_token_predictions
]
self
.
assertAllEqual
(
expected_lm_shape
,
lm_outs
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_disc_rtd_logits_shape
,
disc_rtd_logits
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_disc_rtd_label_shape
,
disc_rtd_label
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_disc_disc_mws_logits_shape
,
disc_mws_logits
.
shape
.
as_list
())
self
.
assertAllEqual
(
expected_disc_disc_mws_label_shape
,
disc_mws_label
.
shape
.
as_list
())
def
test_teams_trainer_tensor_call
(
self
):
"""Validate that the Keras object can be invoked."""
vocab_size
=
100
test_generator_network
=
self
.
_get_network
(
vocab_size
)
test_discriminator_network
=
self
.
_get_network
(
vocab_size
)
# Create a TEAMS trainer with the created network.
teams_trainer_model
=
teams_pretrainer
.
TeamsPretrainer
(
generator_network
=
test_generator_network
,
discriminator_mws_network
=
test_discriminator_network
,
num_discriminator_task_agnostic_layers
=
2
,
vocab_size
=
vocab_size
,
candidate_size
=
2
)
# Create a set of 2-dimensional data tensors to feed into the model.
word_ids
=
tf
.
constant
([[
1
,
1
,
1
],
[
2
,
2
,
2
]],
dtype
=
tf
.
int32
)
mask
=
tf
.
constant
([[
1
,
1
,
1
],
[
1
,
0
,
0
]],
dtype
=
tf
.
int32
)
type_ids
=
tf
.
constant
([[
1
,
1
,
1
],
[
2
,
2
,
2
]],
dtype
=
tf
.
int32
)
lm_positions
=
tf
.
constant
([[
0
,
1
],
[
0
,
2
]],
dtype
=
tf
.
int32
)
lm_ids
=
tf
.
constant
([[
10
,
20
],
[
20
,
30
]],
dtype
=
tf
.
int32
)
inputs
=
{
'input_word_ids'
:
word_ids
,
'input_mask'
:
mask
,
'input_type_ids'
:
type_ids
,
'masked_lm_positions'
:
lm_positions
,
'masked_lm_ids'
:
lm_ids
}
# Invoke the trainer model on the tensors. In Eager mode, this does the
# actual calculation. (We can't validate the outputs, since the network is
# too complex: this simply ensures we're not hitting runtime errors.)
_
=
teams_trainer_model
(
inputs
)
def
test_serialize_deserialize
(
self
):
"""Validate that the TEAMS trainer can be serialized and deserialized."""
vocab_size
=
100
test_generator_network
=
self
.
_get_network
(
vocab_size
)
test_discriminator_network
=
self
.
_get_network
(
vocab_size
)
# Create a TEAMS trainer with the created network. (Note that all the args
# are different, so we can catch any serialization mismatches.)
teams_trainer_model
=
teams_pretrainer
.
TeamsPretrainer
(
generator_network
=
test_generator_network
,
discriminator_mws_network
=
test_discriminator_network
,
num_discriminator_task_agnostic_layers
=
2
,
vocab_size
=
vocab_size
,
candidate_size
=
2
)
# Create another TEAMS trainer via serialization and deserialization.
config
=
teams_trainer_model
.
get_config
()
new_teams_trainer_model
=
teams_pretrainer
.
TeamsPretrainer
.
from_config
(
config
)
# Validate that the config can be forced to JSON.
_
=
new_teams_trainer_model
.
to_json
()
# If the serialization was successful, the new config should match the old.
self
.
assertAllEqual
(
teams_trainer_model
.
get_config
(),
new_teams_trainer_model
.
get_config
())
if
__name__
==
'__main__'
:
tf
.
test
.
main
()
official/nlp/projects/triviaqa/inputs.py
View file @
1f8b5b27
...
@@ -48,15 +48,15 @@ def _flatten_dims(tensor: tf.Tensor,
...
@@ -48,15 +48,15 @@ def _flatten_dims(tensor: tf.Tensor,
rank
=
tensor
.
shape
.
rank
rank
=
tensor
.
shape
.
rank
if
rank
is
None
:
if
rank
is
None
:
raise
ValueError
(
'Static rank of `tensor` must be known.'
)
raise
ValueError
(
'Static rank of `tensor` must be known.'
)
if
first_dim
<
0
:
if
first_dim
<
0
:
# pytype: disable=unsupported-operands
first_dim
+=
rank
first_dim
+=
rank
if
first_dim
<
0
or
first_dim
>=
rank
:
if
first_dim
<
0
or
first_dim
>=
rank
:
# pytype: disable=unsupported-operands
raise
ValueError
(
'`first_dim` out of bounds for `tensor` rank.'
)
raise
ValueError
(
'`first_dim` out of bounds for `tensor` rank.'
)
if
last_dim
<
0
:
if
last_dim
<
0
:
# pytype: disable=unsupported-operands
last_dim
+=
rank
last_dim
+=
rank
if
last_dim
<
0
or
last_dim
>=
rank
:
if
last_dim
<
0
or
last_dim
>=
rank
:
# pytype: disable=unsupported-operands
raise
ValueError
(
'`last_dim` out of bounds for `tensor` rank.'
)
raise
ValueError
(
'`last_dim` out of bounds for `tensor` rank.'
)
if
first_dim
>
last_dim
:
if
first_dim
>
last_dim
:
# pytype: disable=unsupported-operands
raise
ValueError
(
'`first_dim` must not be larger than `last_dim`.'
)
raise
ValueError
(
'`first_dim` must not be larger than `last_dim`.'
)
# Try to calculate static flattened dim size if all input sizes to flatten
# Try to calculate static flattened dim size if all input sizes to flatten
...
...
official/nlp/serving/serving_modules.py
View file @
1f8b5b27
...
@@ -80,12 +80,10 @@ class SentencePrediction(export_base.ExportModule):
...
@@ -80,12 +80,10 @@ class SentencePrediction(export_base.ExportModule):
lower_case
=
params
.
lower_case
,
lower_case
=
params
.
lower_case
,
preprocessing_hub_module_url
=
params
.
preprocessing_hub_module_url
)
preprocessing_hub_module_url
=
params
.
preprocessing_hub_module_url
)
@
tf
.
function
def
_serve_tokenized_input
(
self
,
def
serve
(
self
,
input_word_ids
,
input_word_ids
,
input_mask
=
None
,
input_mask
=
None
,
input_type_ids
=
None
)
->
tf
.
Tensor
:
input_type_ids
=
None
,
use_prob
=
False
)
->
Dict
[
str
,
tf
.
Tensor
]:
if
input_type_ids
is
None
:
if
input_type_ids
is
None
:
# Requires CLS token is the first token of inputs.
# Requires CLS token is the first token of inputs.
input_type_ids
=
tf
.
zeros_like
(
input_word_ids
)
input_type_ids
=
tf
.
zeros_like
(
input_word_ids
)
...
@@ -98,10 +96,26 @@ class SentencePrediction(export_base.ExportModule):
...
@@ -98,10 +96,26 @@ class SentencePrediction(export_base.ExportModule):
input_word_ids
=
input_word_ids
,
input_word_ids
=
input_word_ids
,
input_mask
=
input_mask
,
input_mask
=
input_mask
,
input_type_ids
=
input_type_ids
)
input_type_ids
=
input_type_ids
)
if
not
use_prob
:
return
self
.
inference_step
(
inputs
)
return
dict
(
outputs
=
self
.
inference_step
(
inputs
))
else
:
@
tf
.
function
return
dict
(
outputs
=
tf
.
nn
.
softmax
(
self
.
inference_step
(
inputs
)))
def
serve
(
self
,
input_word_ids
,
input_mask
=
None
,
input_type_ids
=
None
)
->
Dict
[
str
,
tf
.
Tensor
]:
return
dict
(
outputs
=
self
.
_serve_tokenized_input
(
input_word_ids
,
input_mask
,
input_type_ids
))
@
tf
.
function
def
serve_probability
(
self
,
input_word_ids
,
input_mask
=
None
,
input_type_ids
=
None
)
->
Dict
[
str
,
tf
.
Tensor
]:
return
dict
(
outputs
=
tf
.
nn
.
softmax
(
self
.
_serve_tokenized_input
(
input_word_ids
,
input_mask
,
input_type_ids
)))
@
tf
.
function
@
tf
.
function
def
serve_examples
(
self
,
inputs
)
->
Dict
[
str
,
tf
.
Tensor
]:
def
serve_examples
(
self
,
inputs
)
->
Dict
[
str
,
tf
.
Tensor
]:
...
...
official/pip_package/setup.py
View file @
1f8b5b27
...
@@ -81,6 +81,7 @@ setup(
...
@@ -81,6 +81,7 @@ setup(
'official.pip_package*'
,
'official.pip_package*'
,
'official.benchmark*'
,
'official.benchmark*'
,
'official.colab*'
,
'official.colab*'
,
'official.recommendation.ranking.data.preprocessing*'
,
]),
]),
exclude_package_data
=
{
exclude_package_data
=
{
''
:
[
'*_test.py'
,],
''
:
[
'*_test.py'
,],
...
...
official/recommendation/ranking/README.md
View file @
1f8b5b27
...
@@ -68,6 +68,9 @@ Note that the dataset is large (~1TB).
...
@@ -68,6 +68,9 @@ Note that the dataset is large (~1TB).
### Preprocess the data
### Preprocess the data
Follow the instructions in
[
Data Preprocessing
](
data/preprocessing
)
to
preprocess the Criteo Terabyte dataset.
Data preprocessing steps are summarized below.
Data preprocessing steps are summarized below.
Integer feature processing steps, sequentially:
Integer feature processing steps, sequentially:
...
@@ -93,9 +96,9 @@ Training and eval datasets are expected to be saved in many tab-separated values
...
@@ -93,9 +96,9 @@ Training and eval datasets are expected to be saved in many tab-separated values
(TSV) files in the following format: numberical fetures, categorical features
(TSV) files in the following format: numberical fetures, categorical features
and label.
and label.
On each row of the TSV file first
`num_dense_features`
inputs are numerica
l
On each row of the TSV file
, the
first
one is the labe
l
features, then
`vocab_sizes`
categorical features and the last one is the labe
l
(either 0 or 1), the next
`num_dense_features`
inputs are numerica
l
(either 0 or 1)
. Each i-th categorical feature is expected to be an integer in
features, then
`vocab_sizes`
categorical features
. Each i-th categorical feature is expected to be an integer in
the range of
`[0, vocab_sizes[i])`
.
the range of
`[0, vocab_sizes[i])`
.
## Train and Evaluate
## Train and Evaluate
...
...
Prev
1
2
3
4
5
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment