Unverified Commit a35e09d2 authored by Vinh Nguyen's avatar Vinh Nguyen Committed by GitHub
Browse files

Merge branch 'master' into amp_resnet50

parents d5722dcd 1f5a5e9d
# TensorFlow Official Models
The TensorFlow official models are a collection of example models that use TensorFlow's high-level APIs. They are intended to be well-maintained, tested, and kept up to date with the latest TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read.
The TensorFlow official models are a collection of example models that use
TensorFlow's high-level APIs. They are intended to be well-maintained, tested,
and kept up to date with the latest TensorFlow API. They should also be
reasonably optimized for fast performance while still being easy to read.
These models are used as end-to-end tests, ensuring that the models run with the same speed and performance with each new TensorFlow build.
These models are used as end-to-end tests, ensuring that the models run with the
same speed and performance with each new TensorFlow build.
## Tensorflow releases
The master branch of the models are **in development**, and they target the [nightly binaries](https://github.com/tensorflow/tensorflow#installation) built from the [master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master). We aim to keep them backwards compatible with the latest release when possible (currently TensorFlow 1.5), but we cannot always guarantee compatibility.
**Stable versions** of the official models targeting releases of TensorFlow are available as tagged branches or [downloadable releases](https://github.com/tensorflow/models/releases). Model repository version numbers match the target TensorFlow release, such that [branch r1.4.0](https://github.com/tensorflow/models/tree/r1.4.0) and [release v1.4.0](https://github.com/tensorflow/models/releases/tag/v1.4.0) are compatible with [TensorFlow v1.4.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0).
If you are on a version of TensorFlow earlier than 1.4, please [update your installation](https://www.tensorflow.org/install/).
The master branch of the models are **in development**, and they target the
[nightly binaries](https://github.com/tensorflow/tensorflow#installation) built
from the
[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master).
We aim to keep them backwards compatible with the latest release when possible
(currently TensorFlow 1.5), but we cannot always guarantee compatibility.
**Stable versions** of the official models targeting releases of TensorFlow are
available as tagged branches or
[downloadable releases](https://github.com/tensorflow/models/releases). Model
repository version numbers match the target TensorFlow release, such that
[branch r1.4.0](https://github.com/tensorflow/models/tree/r1.4.0) and
[release v1.4.0](https://github.com/tensorflow/models/releases/tag/v1.4.0) are
compatible with
[TensorFlow v1.4.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.4.0).
If you are on a version of TensorFlow earlier than 1.4, please
[update your installation](https://www.tensorflow.org/install/).
## Requirements
Please follow the below steps before running models in this repo:
1. TensorFlow [nightly binaries](https://github.com/tensorflow/tensorflow#installation)
2. Add the top-level ***/models*** folder to the Python path with the command:
```
export PYTHONPATH="$PYTHONPATH:/path/to/models"
```
Please follow the below steps before running models in this repo:
Using Colab:
```
import os
os.environ['PYTHONPATH'] += ":/path/to/models"
```
1. TensorFlow
[nightly binaries](https://github.com/tensorflow/tensorflow#installation)
3. Install dependencies:
```
pip3 install --user -r official/requirements.txt
```
or
```
pip install --user -r official/requirements.txt
```
2. Add the top-level ***/models*** folder to the Python path with the command:
`export PYTHONPATH="$PYTHONPATH:/path/to/models"`
Using Colab: `import os os.environ['PYTHONPATH'] += ":/path/to/models"`
To make Official Models easier to use, we are planning to create a pip installable Official Models package. This is being tracked in [#917](https://github.com/tensorflow/models/issues/917).
3. Install dependencies: `pip3 install --user -r official/requirements.txt` or
`pip install --user -r official/requirements.txt`
To make Official Models easier to use, we are planning to create a pip
installable Official Models package. This is being tracked in
[#917](https://github.com/tensorflow/models/issues/917).
## Available models
**NOTE:** Please make sure to follow the steps in the [Requirements](#requirements) section.
**NOTE:** Please make sure to follow the steps in the
[Requirements](#requirements) section.
* [bert](bert): A powerful pre-trained language representation model: BERT, which
stands for Bidirectional Encoder Representations from Transformers.
* [mnist](mnist): A basic model to classify digits from the MNIST dataset.
* [resnet](resnet): A deep residual network that can be used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
* [transformer](transformer): A transformer model to translate the WMT English to German dataset.
* [wide_deep](wide_deep): A model that combines a wide model and deep network to classify census income data.
* More models to come!
* [bert](bert): A powerful pre-trained language representation model: BERT,
which stands for Bidirectional Encoder Representations from Transformers.
* [mnist](mnist): A basic model to classify digits from the MNIST dataset.
* [resnet](vision/image_classification): A deep residual network that can be
used to classify both CIFAR-10 and ImageNet's dataset of 1000 classes.
* [transformer](transformer): A transformer model to translate the WMT English
to German dataset.
* [ncf](recommendation): Neural Collaborative Filtering model for
recommendation tasks.
Models that will not update to TensorFlow 2.x stay inside R1 directory:
* [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to classify
higgs boson process from HIGGS Data Set.
* [boosted_trees](r1/boosted_trees): A Gradient Boosted Trees model to
classify higgs boson process from HIGGS Data Set.
* [wide_deep](r1/wide_deep): A model that combines a wide model and deep
network to classify census income data.
## More models to come!
We are in the progress to revamp official model garden with TensorFlow 2.0 and
Keras. In the near future, we will bring:
* State-of-the-art language understanding models: XLNet, GPT2, and more
members in Transformer family.
* Start-of-the-art image classification models: EfficientNet, MnasNet and
variants.
* A set of excellent objection detection models.
If you would like to make any fixes or improvements to the models, please [submit a pull request](https://github.com/tensorflow/models/compare).
If you would like to make any fixes or improvements to the models, please
[submit a pull request](https://github.com/tensorflow/models/compare).
## New Models
The team is actively working to add new models to the repository. Every model should follow the following guidelines, to uphold the
our objectives of readable, usable, and maintainable code.
The team is actively working to add new models to the repository. Every model
should follow the following guidelines, to uphold the our objectives of
readable, usable, and maintainable code.
**General guidelines**
* Code should be well documented and tested.
* Runnable from a blank environment with relative ease.
* Trainable on: single GPU/CPU (baseline), multiple GPUs, TPU
* Compatible with Python 2 and 3 (using [six](https://pythonhosted.org/six/) when necessary)
* Conform to [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
**General guidelines** * Code should be well documented and tested. * Runnable
from a blank environment with relative ease. * Trainable on: single GPU/CPU
(baseline), multiple GPUs, TPU * Compatible with Python 2 and 3 (using
[six](https://pythonhosted.org/six/) when necessary) * Conform to
[Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
**Implementation guidelines**
These guidelines exist so the model implementations are consistent for better readability and maintainability.
These guidelines exist so the model implementations are consistent for better
readability and maintainability.
* Use [common utility functions](utils)
* Export SavedModel at the end of training.
* Consistent flags and flag-parsing library ([read more here](utils/flags/guidelines.md))
* Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
* Use [common utility functions](utils)
* Export SavedModel at the end of training.
* Consistent flags and flag-parsing library
([read more here](utils/flags/guidelines.md))
* Produce benchmarks and logs ([read more here](utils/logs/guidelines.md))
......@@ -1043,7 +1043,7 @@ class Resnet50MultiWorkerKerasBenchmark(Resnet50KerasBenchmarkBase):
class Resnet50MultiWorkerKerasBenchmarkSynth(Resnet50MultiWorkerKerasBenchmark):
"""Resnet50 multi-worker synthetic benchmark tests."""
"""Resnet50 multi-worker synthetic data benchmark tests."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
......@@ -1057,5 +1057,20 @@ class Resnet50MultiWorkerKerasBenchmarkSynth(Resnet50MultiWorkerKerasBenchmark):
output_dir=output_dir, default_flags=def_flags)
class Resnet50MultiWorkerKerasBenchmarkReal(Resnet50MultiWorkerKerasBenchmark):
"""Resnet50 multi-worker real data benchmark tests."""
def __init__(self, output_dir=None, root_data_dir=None, **kwargs):
def_flags = {}
def_flags['skip_eval'] = True
def_flags['report_accuracy_metrics'] = False
def_flags['data_dir'] = os.path.join(root_data_dir, 'imagenet')
def_flags['train_steps'] = 110
def_flags['log_steps'] = 10
super(Resnet50MultiWorkerKerasBenchmarkReal, self).__init__(
output_dir=output_dir, default_flags=def_flags)
if __name__ == '__main__':
tf.test.main()
......@@ -43,6 +43,7 @@ SQUAD_FULL_INPUT_META_DATA_PATH = 'gs://tf-perfzero-data/bert/squad/squad_full_m
MODEL_CONFIG_FILE_PATH = 'gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16/bert_config'
# pylint: enable=line-too-long
TMP_DIR = os.getenv('TMPDIR')
FLAGS = flags.FLAGS
......@@ -116,7 +117,7 @@ class BertSquadBenchmarkReal(BertSquadBenchmarkBase):
`benchmark_(number of gpus)_gpu` format.
"""
def __init__(self, output_dir=None, **kwargs):
def __init__(self, output_dir=TMP_DIR, **kwargs):
super(BertSquadBenchmarkReal, self).__init__(output_dir=output_dir)
def _setup(self):
......
......@@ -27,15 +27,19 @@ def define_common_bert_flags():
flags.DEFINE_string('model_dir', None, (
'The directory where the model weights and training/evaluation summaries '
'are stored. If not specified, save to /tmp/bert20/.'))
flags.DEFINE_string(
'model_export_path', None,
'Path to the directory, where trainined model will be '
'exported.')
flags.DEFINE_string('tpu', '', 'TPU address to connect to.')
flags.DEFINE_string(
'init_checkpoint', None,
'Initial checkpoint (usually from a pre-trained BERT model).')
flags.DEFINE_enum(
'strategy_type', 'mirror', ['tpu', 'mirror'],
'strategy_type', 'mirror', ['tpu', 'mirror', 'multi_worker_mirror'],
'Distribution Strategy type to use for training. `tpu` uses '
'TPUStrategy for running on TPUs, `mirror` uses GPUs with '
'single host.')
'TPUStrategy for running on TPUs, `mirror` uses GPUs with single host, '
'`multi_worker_mirror` uses CPUs or GPUs with multiple hosts.')
flags.DEFINE_integer('num_train_epochs', 3,
'Total number of training epochs to perform.')
flags.DEFINE_integer(
......
......@@ -165,6 +165,7 @@ class BertModel(tf.keras.layers.Layer):
max_position_embeddings=self.config.max_position_embeddings,
dropout_prob=self.config.hidden_dropout_prob,
initializer_range=self.config.initializer_range,
dtype=tf.float32,
name="embedding_postprocessor")
self.encoder = Transformer(
num_hidden_layers=self.config.num_hidden_layers,
......@@ -316,8 +317,9 @@ class EmbeddingPostprocessor(tf.keras.layers.Layer):
dtype=self.dtype)
self.output_layer_norm = tf.keras.layers.LayerNormalization(
name="layer_norm", axis=-1, epsilon=1e-12)
self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_prob)
name="layer_norm", axis=-1, epsilon=1e-12, dtype=tf.float32)
self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_prob,
dtype=tf.float32)
super(EmbeddingPostprocessor, self).build(input_shapes)
def __call__(self, word_embeddings, token_type_ids=None, **kwargs):
......@@ -714,11 +716,15 @@ class TransformerBlock(tf.keras.layers.Layer):
rate=self.hidden_dropout_prob)
self.attention_layer_norm = (
tf.keras.layers.LayerNormalization(
name="self_attention_layer_norm", axis=-1, epsilon=1e-12))
name="self_attention_layer_norm", axis=-1, epsilon=1e-12,
# We do layer norm in float32 for numeric stability.
dtype=tf.float32))
self.intermediate_dense = Dense2DProjection(
output_size=self.intermediate_size,
kernel_initializer=get_initializer(self.initializer_range),
activation=self.intermediate_activation,
# Uses float32 so that gelu activation is done in float32.
dtype=tf.float32,
name="intermediate")
self.output_dense = Dense2DProjection(
output_size=self.hidden_size,
......@@ -726,7 +732,7 @@ class TransformerBlock(tf.keras.layers.Layer):
name="output")
self.output_dropout = tf.keras.layers.Dropout(rate=self.hidden_dropout_prob)
self.output_layer_norm = tf.keras.layers.LayerNormalization(
name="output_layer_norm", axis=-1, epsilon=1e-12)
name="output_layer_norm", axis=-1, epsilon=1e-12, dtype=tf.float32)
super(TransformerBlock, self).build(unused_input_shapes)
def common_layers(self):
......@@ -753,6 +759,10 @@ class TransformerBlock(tf.keras.layers.Layer):
attention_output = self.attention_dropout(attention_output)
# Use float32 in keras layer norm and the gelu activation in the
# intermediate dense layer for numeric stability
# TODO(reedwm): These casts are probably unnecessary, as we passed
# dtype=tf.float32 to the layer norm constructor, so it will cast its inputs
# to float32 automatically. These manual casts additionally do the "+"
# operator in float32, but "+" is numerically stable in float16.
if self.float_type == tf.float16:
input_tensor = tf.cast(input_tensor, tf.float32)
attention_output = tf.cast(attention_output, tf.float32)
......
......@@ -105,12 +105,14 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
epsilon=1e-7,
amsgrad=False,
weight_decay_rate=0.0,
include_in_weight_decay=None,
exclude_from_weight_decay=None,
name='AdamWeightDecay',
**kwargs):
super(AdamWeightDecay, self).__init__(
learning_rate, beta_1, beta_2, epsilon, amsgrad, name, **kwargs)
self.weight_decay_rate = weight_decay_rate
self._include_in_weight_decay = include_in_weight_decay
self._exclude_from_weight_decay = exclude_from_weight_decay
@classmethod
......@@ -178,6 +180,12 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
"""Whether to use L2 weight decay for `param_name`."""
if self.weight_decay_rate == 0:
return False
if self._include_in_weight_decay:
for r in self._include_in_weight_decay:
if re.search(r, param_name) is not None:
return True
if self._exclude_from_weight_decay:
for r in self._exclude_from_weight_decay:
if re.search(r, param_name) is not None:
......
......@@ -48,10 +48,6 @@ flags.DEFINE_string('train_data_path', None,
'Path to training data for BERT classifier.')
flags.DEFINE_string('eval_data_path', None,
'Path to evaluation data for BERT classifier.')
flags.DEFINE_string(
'model_export_path', None,
'Path to the directory, where trainined model will be '
'exported.')
# Model training specific flags.
flags.DEFINE_string(
'input_meta_data_path', None,
......
......@@ -31,6 +31,7 @@ import tensorflow as tf
from official.bert import bert_models
from official.bert import common_flags
from official.bert import input_pipeline
from official.bert import model_saving_utils
from official.bert import model_training_utils
from official.bert import modeling
from official.bert import optimization
......@@ -39,8 +40,13 @@ from official.bert import tokenization
from official.utils.misc import keras_utils
from official.utils.misc import tpu_lib
flags.DEFINE_bool('do_train', False, 'Whether to run training.')
flags.DEFINE_bool('do_predict', False, 'Whether to run eval on the dev set.')
flags.DEFINE_enum(
'mode', 'train', ['train', 'predict', 'export_only'],
'One of {"train", "predict", "export_only"}. `train`: '
'trains the model and evaluates in the meantime. '
'`predict`: predict answers from the squad json file. '
'`export_only`: will take the latest checkpoint inside '
'model_dir and export a `SavedModel`.')
flags.DEFINE_string('train_data_path', '',
'Training data path with train tfrecords.')
flags.DEFINE_string(
......@@ -139,6 +145,8 @@ def predict_squad_customized(strategy, input_meta_data, bert_config,
strategy.experimental_distribute_dataset(predict_dataset))
with strategy.scope():
# Prediction always uses float32, even if training uses mixed precision.
tf.keras.mixed_precision.experimental.set_policy('float32')
squad_model, _ = bert_models.squad_model(
bert_config, input_meta_data['max_seq_length'], float_type=tf.float32)
......@@ -187,7 +195,7 @@ def train_squad(strategy,
use_float16 = common_flags.use_float16()
if use_float16:
policy = tf.keras.mixed_precision.experimental.Policy('infer_float32_vars')
policy = tf.keras.mixed_precision.experimental.Policy('mixed_float16')
tf.keras.mixed_precision.experimental.set_policy(policy)
bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
......@@ -212,6 +220,9 @@ def train_squad(strategy,
squad_model.optimizer = optimization.create_optimizer(
FLAGS.learning_rate, steps_per_epoch * epochs, warmup_steps)
if use_float16:
# Wraps optimizer with a LossScaleOptimizer. This is done automatically
# in compile() with the "mixed_float16" policy, but since we do not call
# compile(), we must wrap the optimizer manually.
squad_model.optimizer = (
tf.keras.mixed_precision.experimental.LossScaleOptimizer(
squad_model.optimizer, loss_scale=common_flags.get_loss_scale()))
......@@ -306,6 +317,26 @@ def predict_squad(strategy, input_meta_data):
verbose=FLAGS.verbose_logging)
def export_squad(model_export_path, input_meta_data):
"""Exports a trained model as a `SavedModel` for inference.
Args:
model_export_path: a string specifying the path to the SavedModel directory.
input_meta_data: dictionary containing meta data about input and model.
Raises:
Export path is not specified, got an empty string or None.
"""
if not model_export_path:
raise ValueError('Export path is not specified: %s' % model_export_path)
bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
squad_model, _ = bert_models.squad_model(
bert_config, input_meta_data['max_seq_length'], float_type=tf.float32)
model_saving_utils.export_bert_model(
model_export_path, model=squad_model, checkpoint_dir=FLAGS.model_dir)
def main(_):
# Users should always run this script under TF 2.x
assert tf.version.VERSION.startswith('2.')
......@@ -313,9 +344,15 @@ def main(_):
with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader:
input_meta_data = json.loads(reader.read().decode('utf-8'))
if FLAGS.mode == 'export_only':
export_squad(FLAGS.model_export_path, input_meta_data)
return
strategy = None
if FLAGS.strategy_type == 'mirror':
strategy = tf.distribute.MirroredStrategy()
elif FLAGS.strategy_type == 'multi_worker_mirror':
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
elif FLAGS.strategy_type == 'tpu':
# Initialize TPU System.
cluster_resolver = tpu_lib.tpu_initialize(FLAGS.tpu)
......@@ -323,9 +360,9 @@ def main(_):
else:
raise ValueError('The distribution strategy type is not supported: %s' %
FLAGS.strategy_type)
if FLAGS.do_train:
if FLAGS.mode == 'train':
train_squad(strategy, input_meta_data)
if FLAGS.do_predict:
if FLAGS.mode == 'predict':
predict_squad(strategy, input_meta_data)
......
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Convert checkpoints created by Estimator (tf1) to be Keras compatible.
Keras manages variable names internally, which results in subtly different names
for variables between the Estimator and Keras version.
The script should be ran with TF 1.x.
Usage:
python checkpoint_convert.py \
--checkpoint_from_path="/path/to/checkpoint" \
--checkpoint_to_path="/path/to/new_checkpoint"
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import app
import tensorflow as tf
flags = tf.flags
FLAGS = flags.FLAGS
## Required parameters
flags.DEFINE_string("checkpoint_from_path", None,
"Source BERT checkpoint path.")
flags.DEFINE_string("checkpoint_to_path", None,
"Destination BERT checkpoint path.")
flags.DEFINE_string(
"exclude_patterns", None,
"Comma-delimited string of a list of patterns to exclude"
" variables from source checkpoint.")
# Mapping between old <=> new names. The source pattern in original variable
# name will be replaced by destination pattern.
BERT_NAME_REPLACEMENTS = [
("bert", "bert_model"),
("embeddings/word_embeddings", "word_embeddings/embeddings"),
("embeddings/token_type_embeddings",
"embedding_postprocessor/type_embeddings"),
("embeddings/position_embeddings",
"embedding_postprocessor/position_embeddings"),
("embeddings/LayerNorm", "embedding_postprocessor/layer_norm"),
("attention/self", "self_attention"),
("attention/output/dense", "self_attention_output"),
("attention/output/LayerNorm", "self_attention_layer_norm"),
("intermediate/dense", "intermediate"),
("output/dense", "output"),
("output/LayerNorm", "output_layer_norm"),
("pooler/dense", "pooler_transform"),
]
def _bert_name_replacement(var_name):
for src_pattern, tgt_pattern in BERT_NAME_REPLACEMENTS:
if src_pattern in var_name:
old_var_name = var_name
var_name = var_name.replace(src_pattern, tgt_pattern)
tf.logging.info("Converted: %s --> %s", old_var_name, var_name)
return var_name
def _has_exclude_patterns(name, exclude_patterns):
"""Checks if a string contains substrings that match patterns to exclude."""
for p in exclude_patterns:
if p in name:
return True
return False
def convert_names(checkpoint_from_path,
checkpoint_to_path,
exclude_patterns=None):
"""Migrates the names of variables within a checkpoint.
Args:
checkpoint_from_path: Path to source checkpoint to be read in.
checkpoint_to_path: Path to checkpoint to be written out.
exclude_patterns: A list of string patterns to exclude variables from
checkpoint conversion.
Returns:
A dictionary that maps the new variable names to the Variable objects.
A dictionary that maps the old variable names to the new variable names.
"""
with tf.Graph().as_default():
tf.logging.info("Reading checkpoint_from_path %s", checkpoint_from_path)
reader = tf.train.NewCheckpointReader(checkpoint_from_path)
name_shape_map = reader.get_variable_to_shape_map()
new_variable_map = {}
conversion_map = {}
for var_name in name_shape_map:
if exclude_patterns and _has_exclude_patterns(var_name, exclude_patterns):
continue
new_var_name = _bert_name_replacement(var_name)
tensor = reader.get_tensor(var_name)
var = tf.Variable(tensor, name=var_name)
new_variable_map[new_var_name] = var
if new_var_name != var_name:
conversion_map[var_name] = new_var_name
saver = tf.train.Saver(new_variable_map)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
tf.logging.info("Writing checkpoint_to_path %s", checkpoint_to_path)
saver.save(sess, checkpoint_to_path)
tf.logging.info("Summary:")
tf.logging.info(" Converted %d variable name(s).", len(new_variable_map))
tf.logging.info(" Converted: %s", str(conversion_map))
def main(_):
exclude_patterns = None
if FLAGS.exclude_patterns:
exclude_patterns = FLAGS.exclude_patterns.split(",")
convert_names(FLAGS.checkpoint_from_path, FLAGS.checkpoint_to_path,
exclude_patterns)
if __name__ == "__main__":
flags.mark_flag_as_required("checkpoint_from_path")
flags.mark_flag_as_required("checkpoint_to_path")
app.run(main)
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A converter for BERT name-based checkpoint to object-based checkpoint.
The conversion will yield objected-oriented checkpoint for TF2 Bert models,
when BergConfig.backward_compatible is true.
The variable/tensor shapes matches TF1 BERT model, but backward compatiblity
introduces unnecessary reshape compuation.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from absl import app
from absl import flags
import tensorflow as tf
from official.bert import modeling
FLAGS = flags.FLAGS
flags.DEFINE_string("bert_config_file", None,
"Bert configuration file to define core bert layers.")
flags.DEFINE_string(
"init_checkpoint", None,
"Initial checkpoint (usually from a pre-trained BERT model).")
flags.DEFINE_string("converted_checkpoint", None,
"Path to objected-based V2 checkpoint.")
flags.DEFINE_bool(
"export_bert_as_layer", False,
"Whether to use a layer rather than a model inside the checkpoint.")
def create_bert_model(bert_config):
"""Creates a BERT keras core model from BERT configuration.
Args:
bert_config: A BertConfig` to create the core model.
Returns:
A keras model.
"""
max_seq_length = bert_config.max_position_embeddings
# Adds input layers just as placeholders.
input_word_ids = tf.keras.layers.Input(
shape=(max_seq_length,), dtype=tf.int32, name="input_word_ids")
input_mask = tf.keras.layers.Input(
shape=(max_seq_length,), dtype=tf.int32, name="input_mask")
input_type_ids = tf.keras.layers.Input(
shape=(max_seq_length,), dtype=tf.int32, name="input_type_ids")
core_model = modeling.get_bert_model(
input_word_ids,
input_mask,
input_type_ids,
config=bert_config,
name="bert_model",
float_type=tf.float32)
return core_model
def convert_checkpoint():
"""Converts a name-based matched TF V1 checkpoint to TF V2 checkpoint."""
bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
# Sets backward_compatible to true to convert TF1 BERT checkpoints.
bert_config.backward_compatible = True
core_model = create_bert_model(bert_config)
# Uses streaming-restore in eager model to read V1 name-based checkpoints.
core_model.load_weights(FLAGS.init_checkpoint)
if FLAGS.export_bert_as_layer:
bert_layer = core_model.get_layer("bert_model")
checkpoint = tf.train.Checkpoint(bert_layer=bert_layer)
else:
checkpoint = tf.train.Checkpoint(model=core_model)
checkpoint.save(FLAGS.converted_checkpoint)
def main(_):
tf.enable_eager_execution()
convert_checkpoint()
if __name__ == "__main__":
app.run(main)
......@@ -12,7 +12,13 @@ APIs.
## Setup
To begin, you'll simply need the latest version of TensorFlow installed.
First make sure you've [added the models folder to your Python path](/official/#running-the-models); otherwise you may encounter an error like `ImportError: No module named official.mnist`.
First make sure you've [added the models folder to your Python path]:
```shell
export PYTHONPATH="$PYTHONPATH:/path/to/models"
```
Otherwise you may encounter an error like `ImportError: No module named official.mnist`.
Then to train the model, run the following:
......
......@@ -89,7 +89,9 @@ def create_model(data_format):
def define_mnist_flags():
flags_core.define_base()
flags_core.define_performance(num_parallel_calls=False)
flags_core.define_performance(inter_op=True, intra_op=True,
num_parallel_calls=False,
all_reduce_alg=True)
flags_core.define_image()
flags.adopt_module_key_flags(flags_core)
flags_core.set_defaults(data_dir='/tmp/mnist_data',
......
......@@ -33,7 +33,7 @@ import tensorflow as tf
from official.r1.resnet import imagenet_preprocessing
from official.r1.resnet import resnet_model
from official.utils.export import export
from official.r1.utils import export
from official.utils.flags import core as flags_core
from official.utils.logs import hooks_helper
from official.utils.logs import logger
......@@ -725,6 +725,12 @@ def define_resnet_flags(resnet_size_choices=None, dynamic_loss_scale=False,
"""Add flags and validators for ResNet."""
flags_core.define_base()
flags_core.define_performance(num_parallel_calls=False,
inter_op=True,
intra_op=True,
synthetic_data=True,
dtype=True,
all_reduce_alg=True,
num_packs=True,
tf_gpu_thread_mode=True,
datasets_num_private_threads=True,
dynamic_loss_scale=dynamic_loss_scale,
......
......@@ -20,6 +20,7 @@ from __future__ import print_function
import atexit
import multiprocessing
import multiprocessing.dummy
import os
import tempfile
import uuid
......@@ -78,8 +79,8 @@ def iter_shard_dataframe(df, rows_per_core=1000):
It yields a list of dataframes with length equal to the number of CPU cores,
with each dataframe having rows_per_core rows. (Except for the last batch
which may have fewer rows in the dataframes.) Passing vectorized inputs to
a multiprocessing pool is much more effecient than iterating through a
dataframe in serial and passing a list of inputs to the pool.
a pool is more effecient than iterating through a dataframe in serial and
passing a list of inputs to the pool.
Args:
df: Pandas dataframe to be sharded.
......@@ -134,7 +135,7 @@ def _serialize_shards(df_shards, columns, pool, writer):
Args:
df_shards: A list of pandas dataframes. (Should be of similar size)
columns: The dataframe columns to be serialized.
pool: A multiprocessing pool to serialize in parallel.
pool: A pool to serialize in parallel.
writer: A TFRecordWriter to write the serialized shards.
"""
# Pandas does not store columns of arrays as nd arrays. stack remedies this.
......@@ -190,7 +191,7 @@ def write_to_buffer(dataframe, buffer_path, columns, expected_size=None):
.format(buffer_path))
count = 0
pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool = multiprocessing.dummy.Pool(multiprocessing.cpu_count())
try:
with tf.io.TFRecordWriter(buffer_path) as writer:
for df_shards in iter_shard_dataframe(df=dataframe,
......
......@@ -27,7 +27,7 @@ import pandas as pd
import tensorflow as tf
# pylint: enable=wrong-import-order
from official.utils.data import file_io
from official.r1.utils.data import file_io
from official.utils.misc import keras_utils
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment