Update interleave hyperparameters

PiperOrigin-RevId: 265780130

Update interleave hyperparameters
PiperOrigin-RevId: 265780130
85956b16 · Jing Li · A. Unique TensorFlower · 0fa5ff23 · 85956b16
Commit 85956b16 authored Aug 27, 2019 by Jing Li Committed by A. Unique TensorFlower Aug 27, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

official/bert/input_pipeline.py official/bert/input_pipeline.py +5 -1

No files found.
--- a/official/bert/input_pipeline.py
+++ b/official/bert/input_pipeline.py
@@ -94,8 +94,12 @@ def create_pretrain_dataset(file_paths,
  dataset = dataset.shuffle(len(file_paths))
  # In parallel, create tf record dataset for each train files.
+  # cycle_length = 8 means that up to 8 files will be read and deserialized in
+  # parallel. You may want to increase this number if you have a large number of
+  # CPU cores.
  dataset = dataset.interleave(
-      tf.data.TFRecordDataset, cycle_length=tf.data.experimental.AUTOTUNE)
+      tf.data.TFRecordDataset, cycle_length=8,
+      num_parallel_calls=tf.data.experimental.AUTOTUNE)
  decode_fn = lambda record: decode_record(record, name_to_features)
  dataset = dataset.map(