What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset.

I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension.

from_tensors combines the input and returns a dataset with a single element:

t = tf.constant([[1, 2], [3, 4]])
ds = tf.data.Dataset.from_tensors(t)   # [[1, 2], [3, 4]]

from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

t = tf.constant([[1, 2], [3, 4]])
ds = tf.data.Dataset.from_tensor_slices(t)   # [1, 2], [3, 4]

Better documentation for Dataset.from_tensors/from_tensor_slices , While following Google's ML crash course, I found it very difficult to understand the difference between Dataset.from_tensors/from_tensor_slices  Syntax : tf.data.Dataset.from_tensor_slices(list) Return : Return the objects of sliced elements. Example #1 : In this example we can see that by using tf.data.Dataset.from_tensor_slices() method, we are able to get the slices of list or array.

1) Main difference between the two is that nested elements in from_tensor_slices must have the same dimension in 0th rank:

# exception: ValueError: Dimensions 10 and 9 are not compatible
dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([10, 4]), tf.random_uniform([9])))
# OK, first dimension is same
dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([10, 4]), tf.random_uniform([10])))

2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:

dataset1 = tf.data.Dataset.from_tensor_slices(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

dataset2 = tf.data.Dataset.from_tensors(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

print(dataset1) # shapes: (2, 3)
print(dataset2) # shapes: (2, 2, 3)

In the above, from_tensors creates a 3D tensor while from_tensor_slices merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.

3) A mentioned in the previous answer, from_tensors convert the input tensor into one big tensor:

import tensorflow as tf

tf.enable_eager_execution()

dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

for i, item in enumerate(dataset1):
    print('element: ' + str(i + 1), item[0], item[1])

print(30*'-')

for i, item in enumerate(dataset2):
    print('element: ' + str(i + 1), item[0], item[1])

output:

element: 1 tf.Tensor(... shapes: ((2,), ()))
element: 2 tf.Tensor(... shapes: ((2,), ()))
element: 3 tf.Tensor(... shapes: ((2,), ()))
element: 4 tf.Tensor(... shapes: ((2,), ()))
-------------------------
element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

tf.data.Dataset, format is a simple format for storing a sequence of binary records. Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data. Protocol messages are defined by . proto files, these are often the easiest way to understand a message type. Dismiss Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.

Try this :

import tensorflow as tf  # 1.13.1
tf.enable_eager_execution()

t1 = tf.constant([[11, 22], [33, 44], [55, 66]])

print("\n=========     from_tensors     ===========")
ds = tf.data.Dataset.from_tensors(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

print("\n=========   from_tensor_slices    ===========")
ds = tf.data.Dataset.from_tensor_slices(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

output :

=========      from_tensors    ===========
<dtype: 'int32'> : (3, 2)
tf.Tensor(
[[11 22]
 [33 44]
 [55 66]], shape=(3, 2), dtype=int32)

=========   from_tensor_slices      ===========
<dtype: 'int32'> : (2,)
tf.Tensor([11 22], shape=(2,), dtype=int32)
tf.Tensor([33 44], shape=(2,), dtype=int32)
tf.Tensor([55 66], shape=(2,), dtype=int32)

The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :

t1 = tf.constant([[[11, 22], [33, 44], [55, 66]],
                  [[110, 220], [330, 440], [550, 660]]])

TensorFlow Datasets, We can, of course, initialise our dataset with some tensor # using a tensor dataset = tf.data.Dataset.from_tensor_slices(tf.random_uniform([100,  Also using tf.data.Dataset.from_tensors is no option. Even though this seems to handle tuples properly, one only gets a single element instead of n elements. This is in alignment with the documentation but does not fulfill the same functionality as tf.data.Dataset.from_tensor_slices .

TFRecord and tf.Example, Dataset.from_tensors() to combine the input, otherwise use tf.data.Dataset.​from_tensor_slices() if you want a separate row for each input tensor. The difference between the first two APIs is shown as follows: #combine the input into one  Create a source dataset using one of the factory functions like Dataset.from_tensors, Dataset.from_tensor_slices, or using objects that read from files like TextLineDataset or TFRecordDataset. See the TensorFlow Dataset guide for more information.

tf.data: Build TensorFlow input pipelines, Functions TABLE 10-1 That Create Datasets Member Description range(*args) Dataset.range(2, 8, 2) # [2, 4, 6] The from_tensors and from_tensor_slices are  For example, to construct a Dataset from data in memory, you can use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). Alternatively, if your input data is stored in a file in the recommended TFRecord format, you can use tf.data.TFRecordDataset().

How to use Dataset in TensorFlow, Use the new and improved features of TensorFlow to enhance machine learning and deep learning Ajay Baranwal, Alizishaan Khatri, Dataset.from_tensors((​features, labels)) from_tensor_slices(. Dataset using all the different file formats​. Load data using tf.data.Dataset. Use tf.data.Dataset.from_tensor_slices to read the values from a pandas dataframe.. One of the advantages of using tf.data.Dataset is it allows you to write simple, highly efficient data pipelines.

TensorFlow 2 Pocket Primer, Having efficient data pipelines is of paramount importance for any machine from_tensors: It also accepts single or multiple numpy arrays or tensors. Dataset.from_tensor_slices(data).batch(10)# creates the iterator to Feedable iterator: Can be used to switch between Iterators for different Datasets. train_dataset = tf.data.Dataset.from_tensor_slices((x,y)) test_dataset = tf.data.Dataset.from_tensor_slices((x,y)) One for training and one for testing. Then, we can create our iterator, in this case we use the initializable iterator, but you can also use a one shot iterator

Comments
  • @MathewScarpino: can you elaborate more on when to use when?
  • with tf 2 i get: AttributeError: 'TensorDataset' object has no attribute 'output_types'