## What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

I have a dataset represented as a NumPy matrix of shape `(num_features, num_examples)`

and I wish to convert it to TensorFlow type `tf.Dataset`

.

I am struggling trying to understand the difference between these two methods: `Dataset.from_tensors`

and `Dataset.from_tensor_slices`

. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using `from_tensor_slices`

the tensor should have same size in the 0-th dimension.

`from_tensors`

combines the input and returns a dataset with a single element:

t = tf.constant([[1, 2], [3, 4]]) ds = tf.data.Dataset.from_tensors(t) # [[1, 2], [3, 4]]

`from_tensor_slices`

creates a dataset with a separate element for each row of the input tensor:

t = tf.constant([[1, 2], [3, 4]]) ds = tf.data.Dataset.from_tensor_slices(t) # [1, 2], [3, 4]

**Better documentation for Dataset.from_tensors/from_tensor_slices ,** While following Google's ML crash course, I found it very difficult to understand the difference between Dataset.from_tensors/from_tensor_slices Syntax : tf.data.Dataset.from_tensor_slices(list) Return : Return the objects of sliced elements. Example #1 : In this example we can see that by using tf.data.Dataset.from_tensor_slices() method, we are able to get the slices of list or array.

1) Main difference between the two is that nested elements in `from_tensor_slices`

must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([10, 4]), tf.random_uniform([9]))) # OK, first dimension is same dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([10, 4]), tf.random_uniform([10])))

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) dataset2 = tf.data.Dataset.from_tensors( [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])]) print(dataset1) # shapes: (2, 3) print(dataset2) # shapes: (2, 2, 3)

In the above, `from_tensors`

creates a 3D tensor while `from_tensor_slices`

merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, `from_tensors`

convert the input tensor into one big tensor:

import tensorflow as tf tf.enable_eager_execution() dataset1 = tf.data.Dataset.from_tensor_slices( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) dataset2 = tf.data.Dataset.from_tensors( (tf.random_uniform([4, 2]), tf.random_uniform([4]))) for i, item in enumerate(dataset1): print('element: ' + str(i + 1), item[0], item[1]) print(30*'-') for i, item in enumerate(dataset2): print('element: ' + str(i + 1), item[0], item[1])

output:

element: 1 tf.Tensor(... shapes: ((2,), ())) element: 2 tf.Tensor(... shapes: ((2,), ())) element: 3 tf.Tensor(... shapes: ((2,), ())) element: 4 tf.Tensor(... shapes: ((2,), ())) ------------------------- element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

**tf.data.Dataset,** format is a simple format for storing a sequence of binary records. Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data. Protocol messages are defined by . proto files, these are often the easiest way to understand a message type. Dismiss Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.

Try this :

import tensorflow as tf # 1.13.1 tf.enable_eager_execution() t1 = tf.constant([[11, 22], [33, 44], [55, 66]]) print("\n========= from_tensors ===========") ds = tf.data.Dataset.from_tensors(t1) print(ds.output_types, end=' : ') print(ds.output_shapes) for e in ds: print (e) print("\n========= from_tensor_slices ===========") ds = tf.data.Dataset.from_tensor_slices(t1) print(ds.output_types, end=' : ') print(ds.output_shapes) for e in ds: print (e)

output :

========= from_tensors =========== <dtype: 'int32'> : (3, 2) tf.Tensor( [[11 22] [33 44] [55 66]], shape=(3, 2), dtype=int32) ========= from_tensor_slices =========== <dtype: 'int32'> : (2,) tf.Tensor([11 22], shape=(2,), dtype=int32) tf.Tensor([33 44], shape=(2,), dtype=int32) tf.Tensor([55 66], shape=(2,), dtype=int32)

The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :

t1 = tf.constant([[[11, 22], [33, 44], [55, 66]], [[110, 220], [330, 440], [550, 660]]])

**TensorFlow Datasets,** We can, of course, initialise our dataset with some tensor # using a tensor dataset = tf.data.Dataset.from_tensor_slices(tf.random_uniform([100, Also using tf.data.Dataset.from_tensors is no option. Even though this seems to handle tuples properly, one only gets a single element instead of n elements. This is in alignment with the documentation but does not fulfill the same functionality as tf.data.Dataset.from_tensor_slices .

**TFRecord and tf.Example,** Dataset.from_tensors() to combine the input, otherwise use tf.data.Dataset.from_tensor_slices() if you want a separate row for each input tensor. The difference between the first two APIs is shown as follows: #combine the input into one Create a source dataset using one of the factory functions like Dataset.from_tensors, Dataset.from_tensor_slices, or using objects that read from files like TextLineDataset or TFRecordDataset. See the TensorFlow Dataset guide for more information.

**tf.data: Build TensorFlow input pipelines,** Functions TABLE 10-1 That Create Datasets Member Description range(*args) Dataset.range(2, 8, 2) # [2, 4, 6] The from_tensors and from_tensor_slices are For example, to construct a Dataset from data in memory, you can use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). Alternatively, if your input data is stored in a file in the recommended TFRecord format, you can use tf.data.TFRecordDataset().

**How to use Dataset in TensorFlow,** Use the new and improved features of TensorFlow to enhance machine learning and deep learning Ajay Baranwal, Alizishaan Khatri, Dataset.from_tensors((features, labels)) from_tensor_slices(. Dataset using all the different file formats. Load data using tf.data.Dataset. Use tf.data.Dataset.from_tensor_slices to read the values from a pandas dataframe.. One of the advantages of using tf.data.Dataset is it allows you to write simple, highly efficient data pipelines.

**TensorFlow 2 Pocket Primer,** Having efficient data pipelines is of paramount importance for any machine from_tensors: It also accepts single or multiple numpy arrays or tensors. Dataset.from_tensor_slices(data).batch(10)# creates the iterator to Feedable iterator: Can be used to switch between Iterators for different Datasets. train_dataset = tf.data.Dataset.from_tensor_slices((x,y)) test_dataset = tf.data.Dataset.from_tensor_slices((x,y)) One for training and one for testing. Then, we can create our iterator, in this case we use the initializable iterator, but you can also use a one shot iterator

##### Comments

- @MathewScarpino: can you elaborate more on when to use when?
- with tf 2 i get: AttributeError: 'TensorDataset' object has no attribute 'output_types'