Sliding window of a batch in Tensorflow using Dataset API
tensorflow dataset
tensorflow custom dataset
tensorflow dataset size
tensorflow create dataset from images
tensorflow read csv
tensorflow map
tensorflow shard
Is there a way to modify the composition of my images within a batch? At the moment, when I'm creating e.g. a batch with the size of 4, my batches will look like that:
Batch1: [Img0 Img1 Img2 Img3]
Batch2: [Img4 Img5 Img6 Img7]
I need to modify the composition of my batches so that it will only shift once to the next image. Then it should look like that:
Batch1: [Img0 Img1 Img2 Img3]
Batch2: [Img1 Img2 Img3 Img4]
Batch3: [Img2 Img3 Img4 Img5]
Batch4: [Img3 Img4 Img5 Img6]
Batch5: [Img4 Img5 Img6 Img7]
I'm using in my code the Dataset API of Tensorflow which looks as follows:
def tfrecords_train_input(input_dir, examples, epochs, nsensors, past, future, features, batch_size, threads, shuffle, record_type): filenames = sorted( [os.path.join(input_dir, f) for f in os.listdir(input_dir)]) num_records = 0 for fn in filenames: for _ in tf.python_io.tf_record_iterator(fn): num_records += 1 print("Number of files to use:", len(filenames), "/ Total records to use:", num_records) dataset = tf.data.TFRecordDataset(filenames) # Parse records read_proto = partial(record_type().read_proto, nsensors=nsensors, past=past, future=future, features=features) # Parallelize Data Transformation on available GPU dataset = dataset.map(map_func=read_proto, num_parallel_calls=threads) # Cache data dataset = dataset.cache() # repeat after shuffling dataset = dataset.repeat(epochs) # Batch data dataset = dataset.batch(batch_size) # Efficient Pipelining dataset = dataset.prefetch(2) iterator = dataset.make_one_shot_iterator() return iterator
Can be achieved using sliding window
batch operation for tf.data.Dataset
:
Example:
from tensorflow.contrib.data.python.ops import sliding imgs = tf.constant(['img0','img1', 'img2','img3', 'img4','img5', 'img6', 'img7']) labels = tf.constant([0, 0, 0, 1, 1, 1, 0, 0]) # create TensorFlow Dataset object data = tf.data.Dataset.from_tensor_slices((imgs, labels)) # sliding window batch window = 4 stride = 1 data = data.apply(sliding.sliding_window_batch(window, stride)) # create TensorFlow Iterator object iterator = tf.data.Iterator.from_structure(data.output_types,data.output_shapes) next_element = iterator.get_next() # create initialization ops init_op = iterator.make_initializer(data) with tf.Session() as sess: # initialize the iterator on the data sess.run(init_op) while True: try: elem = sess.run(next_element) print(elem) except tf.errors.OutOfRangeError: print("End of dataset.") break
Output:
(array([b'img0', b'img1', b'img2', b'img3'], dtype=object), array([0, 0, 0, 1], dtype=int32)) (array([b'img1', b'img2', b'img3', b'img4'], dtype=object), array([0, 0, 1, 1], dtype=int32)) (array([b'img2', b'img3', b'img4', b'img5'], dtype=object), array([0, 1, 1, 1], dtype=int32)) (array([b'img3', b'img4', b'img5', b'img6'], dtype=object), array([1, 1, 1, 0], dtype=int32)) (array([b'img4', b'img5', b'img6', b'img7'], dtype=object), array([1, 1, 0, 0], dtype=int32))
tf.data.Dataset, Note: This API is new and only available in tf-nightly . View source on GitHub. Creates a dataset of sliding windows over a timeseries provided as array. along with time series parameters such as length of the sequences/windows, spacing two sequence/windows, etc., to produce batches of timeseries inputs and targets. dataset: A dataset. size: representing the number of elements of the input dataset to combine into a window. shift: epresenting the forward shift of the sliding window in each iteration. Defaults to size. stride: representing the stride of the input elements in the sliding window. drop_remainder
Answering both the original post and Answering @cabbage_soup's comment to vijay's response:
To achieve an efficient sliding window the following code can be used.
data = data.window(size=batch_size, stride=1, shift=1, drop_remainder=True )
data = data.interleave( lambda *window: tf.data.Dataset.zip(tuple([w.batch(batch_size) for w in window])), cycle_length=10, block_length=10 ,num_parallel_calls=4 )
Interleave is used instead of flat_map as it allows processing to be done in parallel during this window transformation.
Refer to the documentation to choose values for cycle_length, block_length and num_parallel_calls that are appropriate for your hardware and data.
tf.keras.preprocessing.timeseries_dataset_from_array, The rolling window batch would create a dataset with the following for flat_map but it would be nice to have sliding windows builtin in tf.data :. The Dataset.prefetch(m) transformation prefetches m elements of its direct input. In this case, since its direct input is dataset.batch(n) and each element of that dataset is a batch (of n elements), it will prefetch m batches.
With tensorflow >= 2.1, it is possible to use the window(), flat_map() and batch() functions to get desired results.
Example:
## Sample data list x_train = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90] ## Constants batch_size = 10 shift_window_size = 1 ## Create tensor slices train_d = tf.data.Dataset.from_tensor_slices(x_train) ## Create dataset of datasets with a specific window and shift size train_d = train_d.window(size=batch_size,shift=shift_window_size, drop_remainder=True) ## Define a function to create a flat dataset from the dataset of datasets def create_seqeunce_ds(chunk): return chunk.batch(batch_size, drop_remainder=True) ## Create a dataset using a map with mapping function defined above train_d = train_d.flat_map(create_seqeunce_ds) ## Check the contents for item in train_d: print(item)
Output:
tf.Tensor([ 1 2 3 4 5 6 7 8 9 10], shape=(10,), dtype=int32) tf.Tensor([ 2 3 4 5 6 7 8 9 10 20], shape=(10,), dtype=int32) tf.Tensor([ 3 4 5 6 7 8 9 10 20 30], shape=(10,), dtype=int32) tf.Tensor([ 4 5 6 7 8 9 10 20 30 40], shape=(10,), dtype=int32) tf.Tensor([ 5 6 7 8 9 10 20 30 40 50], shape=(10,), dtype=int32) tf.Tensor([ 6 7 8 9 10 20 30 40 50 60], shape=(10,), dtype=int32) tf.Tensor([ 7 8 9 10 20 30 40 50 60 70], shape=(10,), dtype=int32) tf.Tensor([ 8 9 10 20 30 40 50 60 70 80], shape=(10,), dtype=int32) tf.Tensor([ 9 10 20 30 40 50 60 70 80 90], shape=(10,), dtype=int32)
More details can be found here: TF Data Guide
rolling window batch operation for tf.data.Dataset · Issue #15044 , System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes. OS Platform #Simple window batch test import tensorflow as tf data = tf.data. The old sliding window batch works well. And Contact GitHub · Pricing · API · Training · Blog · About. I have a model that passes data through a CNN and into an LSTM. But before the LSTM stage I have to reorganise the output of the CNN, as follows: Input: (600, 512, 1) --(CNN)--> (600, 32) This (
window + flat_map fails on dataset of tuples · Issue #27119 , Solve computer vision problems with modeling in TensorFlow and Python Iffat Zafar, The train/test dataset split Datasets ImageNet CIFAR Loading CIFAR Image Learning rate scheduling Introduction to the tf.data API The main training loop Object detection as classification – Sliding window Using heuristics to guide Normally, after calling batch () to specify the batch size, we would use make_one_shot_iterator () or make_initializable_iterator () to get an iterator to loop through the dataset. But either of them is deprecated and removed from Tensorflow 2.0. Instead, we can just call take () and pass in the number of batches we want.
Hands-On Convolutional Neural Networks with TensorFlow: Solve , I use TensorFlow dataset API to feed our network, as it gives me slightly more flexibility at how to batch images. This will also allow for some performance A Dataset of pseudorandom values. Returns an iterator which converts all elements of the dataset to numpy. Use as_numpy_iterator to inspect the content of your dataset. To see element shapes and types, print dataset elements directly instead of using as_numpy_iterator. Combines consecutive elements
InceptionV3 with sliding window image breakdown, The tf.data API enables you to build complex input pipelines from simple, For example, to construct a Dataset from data in memory, you can use tf.data. a repeat will show every element of one epoch before moving to the next: To predict a whole window instead of a fixed offset you can split the batches into two parts:. Pre-trained models and datasets built by Google and the community Tools Ecosystem of tools to help you use TensorFlow
Comments
- In tensorflow >= 1.12, you should use
data.window(size=window, shift=1, stride=stride).flat_map(lambda x: x.batch(window))
in place of deprecateddata.apply(sliding.sliding_window_batch(window, stride))
- Is there are a way to use the updated function call to work on multiple inputs?
- @TomaszSętkowski That doesn't seem to work. The lambda complains about receiving a tuple, with size that matches that of a single entry. This was using a CSV dataset. (I haven't tried a minimal example.)