## What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

I have a dataset represented as a NumPy matrix of shape `(num_features, num_examples)`

and I wish to convert it to TensorFlow type `tf.Dataset`

.

I am struggling trying to understand the difference between these two methods: `Dataset.from_tensors`

and `Dataset.from_tensor_slices`

. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using `from_tensor_slices`

the tensor should have same size in the 0-th dimension.

`from_tensors`

combines the input and returns a dataset with a single element:

t = tf.constant([[1, 2], [3, 4]]) ds = tf.data.Dataset.from_tensors(t) # [[1, 2], [3, 4]]

`from_tensor_slices`

creates a dataset with a separate element for each row of the input tensor:

t = tf.constant([[1, 2], [3, 4]]) ds = tf.data.Dataset.from_tensor_slices(t) # [1, 2], [3, 4]

1) Main difference between the two is that nested elements in `from_tensor_slices`

must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([10, 4]), tf.random_uniform([9]))) # OK, first dimension is same dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([10, 4]), tf.random_uniform([10])))

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) dataset2 = tf.data.Dataset.from_tensors( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) print(dataset1) # shapes: (2, 3) print(dataset2) # shapes: (2, 2, 3)

In the above, `from_tensors`

creates a 3D tensor while `from_tensor_slices`

merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, `from_tensors`

convert the input tensor into one big tensor:

import tensorflow as tf tf.enable_eager_execution() dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) for i, item in enumerate(dataset1): print('element: ' + str(i + 1), item[0], item[1]) print(30*'-') for i, item in enumerate(dataset2): print('element: ' + str(i + 1), item[0], item[1])

output:

element: 1 tf.Tensor(... shapes: ((2,), ())) element: 2 tf.Tensor(... shapes: ((2,), ())) element: 3 tf.Tensor(... shapes: ((2,), ())) element: 4 tf.Tensor(... shapes: ((2,), ())) ------------------------- element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

Try this :

import tensorflow as tf # 1.13.1 tf.enable_eager_execution() t1 = tf.constant([[11, 22], [33, 44], [55, 66]]) print("\n========= from_tensors ===========") ds = tf.data.Dataset.from_tensors(t1) print(ds.output_types, end=' : ') print(ds.output_shapes) for e in ds: print (e) print("\n========= from_tensor_slices ===========") ds = tf.data.Dataset.from_tensor_slices(t1) print(ds.output_types, end=' : ') print(ds.output_shapes) for e in ds: print (e)

output :

========= from_tensors =========== <dtype: 'int32'> : (3, 2) tf.Tensor( [[11 22] [33 44] [55 66]], shape=(3, 2), dtype=int32) ========= from_tensor_slices =========== <dtype: 'int32'> : (2,) tf.Tensor([11 22], shape=(2,), dtype=int32) tf.Tensor([33 44], shape=(2,), dtype=int32) tf.Tensor([55 66], shape=(2,), dtype=int32)

The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :

t1 = tf.constant([[[11, 22], [33, 44], [55, 66]], [[110, 220], [330, 440], [550, 660]]])

##### Comments

- @MathewScarpino: can you elaborate more on when to use when?
- with tf 2 i get: AttributeError: 'TensorDataset' object has no attribute 'output_types'