-
Notifications
You must be signed in to change notification settings - Fork 911
Open
Description
Hi!
I compared Keras fit time for dataset and experimental_distribute_dataset from your great notebook using latest TF version. It turned out that distributed dataset adds no speedup. Are you sure that your distributed input pipeline is well optimized for TPU? Why don't you use other optimizations like these:
def input_fn(batch_size):
"""> 2000 images/sec"""
files = tf.data.Dataset.list_files(FLAGS.data_dir)
def tftecord_dataset(filename):
buffer_size = 8 * 1024 * 1024 # 8 MiB per file
return tf.data.TFRecordDataset(filename, buffer_size=buffer_size)
dataset = files.apply(tf.contrib.data.parallel_interleave(
tftecord_dataset, cycle_length=32, sloppy=True))
dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(10000, NUM_EPOCHS))
dataset = dataset.apply(tf.contrib.data.map_and_batch(
parser_fn, batch_size, num_parallel_calls=4))
return dataset.prefetch(4)
if FLAGS.use_tpu:
# When using TPU, wrap the optimizer with CrossShardOptimizer which
# handles synchronizarion details between different TPU cores.
optimizer = tpu_optimizer.CrossShardOptimizer(optimizer)
Metadata
Metadata
Assignees
Labels
No labels