When TensorFlow 1.4 was released there were very few fully working examples of the Datasets API for sequence data. Rather than a full tutorial, here are two scripts with explanatory notes.

The GitHub repository contains:

placeholder_vs_iterators.py — Three data input approaches:

  • Traditional placeholder method
  • Iterators
  • Feedable iterators

generator_vs_tfrecord.py — Three methods for iterating through sequence data during training:

  • Generator function with preprocessing (zero-padding, batching)
  • Pre-processed data via generator
  • TFRecord files using SequenceExample Protocol Buffers (the most Datasets API-dependent approach)

References: TF Datasets documentation, Google Developers blog post on the API.