Skip to content

TPU-optimized pipeline #55

@qo4on

Description

@qo4on

Hi!
I compared Keras fit time for dataset and experimental_distribute_dataset from your great notebook using latest TF version. It turned out that distributed dataset adds no speedup. Are you sure that your distributed input pipeline is well optimized for TPU? Why don't you use other optimizations like these:

def input_fn(batch_size):
    """> 2000 images/sec"""
    files = tf.data.Dataset.list_files(FLAGS.data_dir)

    def tftecord_dataset(filename):
        buffer_size = 8 * 1024 * 1024   # 8 MiB per file
        return tf.data.TFRecordDataset(filename, buffer_size=buffer_size)

    dataset = files.apply(tf.contrib.data.parallel_interleave(
        tftecord_dataset, cycle_length=32, sloppy=True))
    dataset = dataset.apply(tf.contrib.data.shuffle_and_repeat(10000, NUM_EPOCHS))
    dataset = dataset.apply(tf.contrib.data.map_and_batch(
        parser_fn, batch_size, num_parallel_calls=4))
    return dataset.prefetch(4)

if FLAGS.use_tpu:
    # When using TPU, wrap the optimizer with CrossShardOptimizer which
    # handles synchronizarion details between different TPU cores.
    optimizer = tpu_optimizer.CrossShardOptimizer(optimizer)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions