When TensorFlow 1.4 was released there were very few fully working examples of the Datasets API for sequence data. Rather than a full tutorial, here are two scripts with explanatory notes.
The GitHub repository contains:
placeholder_vs_iterators.py — Three data input approaches:
- Traditional placeholder method
- Iterators
- Feedable iterators
generator_vs_tfrecord.py — Three methods for iterating through sequence data during training:
- Generator function with preprocessing (zero-padding, batching)
- Pre-processed data via generator
- TFRecord files using SequenceExample Protocol Buffers (the most Datasets API-dependent approach)
References: TF Datasets documentation, Google Developers blog post on the API.