Keras flowFromDirectory get file names as they are being generated

Related searches

Is it possible to get the file names that were loaded using flow_from_directory ? I have :

datagen = ImageDataGenerator(
    rotation_range=3,
#     featurewise_std_normalization=True,
    fill_mode='nearest',
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)

train_generator = datagen.flow_from_directory(
        path+'/train',
        target_size=(224, 224),
        batch_size=batch_size,)

I have a custom generator for my multi output model like:

a = np.arange(8).reshape(2, 4)
# print(a)

print(train_generator.filenames)

def generate():
    while 1:
        x,y = train_generator.next()
        yield [x] ,[a,y]

Node that at the moment I am generating random numbers for a but for real training , I wish to load up a json file that contains the bounding box coordinates for my images. For that I will need to get the file names that were generated using train_generator.next() method. After I have that , I can load the file, parse the json and pass it instead of a. It is also necessary that the ordering of the x variable and the list of the file names that I get is the same.

Yes is it possible, at least with version 2.0.4 (don't know about earlier version).

The instance of ImageDataGenerator().flow_from_directory(...) has an attribute with filenames which is a list of all the files in the order the generator yields them and also an attribute batch_index. So you can do it like this:

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)

And every iteration on generator you can get the corresponding filenames like this:

for i in gen:
    idx = (gen.batch_index - 1) * gen.batch_size
    print(gen.filenames[idx : idx + gen.batch_size])

This will give you the filenames of the images in the current batch.

Accessing filenames while batch processing images, Keras flowFromDirectory get file names as they are being generated. datagen = ImageDataGenerator( rotation_range=3, # featurewise_std_normalization=True, fill_mode='nearest', width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True. ) For that, I will need to get the file names that were generated using train_generator.next() method. After I have that, I can load the file, parse the JSON and pass it instead of a. It is also necessary that the ordering of the x variable and the list of the file names that I get are the same.

You can make a pretty minimal subclass that returns the image, file_path tuple by inheriting the DirectoryIterator:

import numpy as np
from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator

class ImageWithNames(DirectoryIterator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.filenames_np = np.array(self.filepaths)
        self.class_mode = None # so that we only get the images back

    def _get_batches_of_transformed_samples(self, index_array):
        return (super()._get_batches_of_transformed_samples(index_array),
                self.filenames_np[index_array])

In the init, I added a attribute that is the numpy version of self.filepaths so that we can easily index into that array to get the paths on each batch generation.

The only other change to the base class is to return a tuple that is the image batch super()._get_batches_of_transformed_samples(index_array) and the file paths self.filenames_np[index_array].

With that, you can make your generator like so:

imagegen = ImageDataGenerator()
datagen = ImageWithNames('/data/path', imagegen, target_size=(224,224))

And then check with

next(datagen)

Tutorial on using Keras flow_from_directory and generators, While Keras does not document this - I can access all filenames like this: each bit of info into a separate array, and then generate the batches yourself. and getting scared - I managed to modify it to return image filenames� Tutorial on using Keras flow_from_directory and generators. , flow_from_directory() The folder names for the classes are important, name(or rename) them with respective label names so that

Here is an example that works with shuffle=True as well. And also properly handles last batch. To make one pass:

datagen = ImageDataGenerator().flow_from_directory(...)    
batches_per_epoch = datagen.samples // datagen.batch_size + (datagen.samples % datagen.batch_size > 0)
for i in range(batches_per_epoch):
    batch = next(datagen)
    current_index = ((datagen.batch_index-1) * datagen.batch_size)
    if current_index < 0:
        if datagen.samples % datagen.batch_size > 0:
            current_index = max(0,datagen.samples - datagen.samples % datagen.batch_size)
        else:
            current_index = max(0,datagen.samples - datagen.batch_size)
    index_array = datagen.index_array[current_index:current_index + datagen.batch_size].tolist()
    img_paths = [datagen.filepaths[idx] for idx in index_array]
    #batch[0] - x, batch[1] - y, img_paths - absolute path

Keras ImageDataGenerator methods: An easy guide, Keras has this ImageDataGenerator class which allows the users to perform image… I have a question, for the instance when flow from directory (train) if you want to shuffle the order of the image that is being yielded, else set False. ids such as filenames to find out what you predicted for which image. Neural style transfer. By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the cache_subdir datasets , and given the filename fname. The final location of a file example.txt would therefore be ~/.keras/datasets/example.txt. Files in tar, tar.gz, tar.bz, and zip formats can also be extracted.

at least with version 2.2.4,you can do it like this

datagen = ImageDataGenerator()
gen = datagen.flow_from_directory(...)
for file in gen.filenames:
    print(file)

or get the file path

for filepath in gen.filepaths:
    print(filepath)

directory value : The path to the parent directory containing all images. x_col value : which will be the name of column(in dataframe) having file� 25 Keras flowFromDirectory get file names as they are being generated May 19 '17 4 In Tensorflow, get the names of all the Tensors in a graph Mar 16 '18 4 What happens when you add a tensor with a numpy array?

the below code might help. Overriding the flow_from_directory

    class AugmentingDataGenerator(ImageDataGenerator):
    def flow_from_directory(self, directory, mask_generator, *args, **kwargs):
        generator = super().flow_from_directory(directory, class_mode=None, *args, **kwargs)        
        seed = None if 'seed' not in kwargs else kwargs['seed']
        while True:           
            for image_path in generator.filepaths:
                # Get augmentend image samples
                image = next(generator)
                # print(image_path )

                yield image,image_path

# Create training generator
train_datagen = AugmentingDataGenerator(  
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    rescale=1./255,
    horizontal_flip=True
)
train_generator = train_datagen.flow_from_directory(
    TRAIN_DIRECTORY_PATH, 
    target_size=(256, 256),
    shuffle = False,
    batch_size=BATCH_SIZE
)

# Create testing generator
test_datagen = AugmentingDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
    TEST_DIRECTORY_PATH,  
    target_size=(256, 256),
    shuffle = False, # inorder to get imagepath of the same image
    batch_size=BATCH_SIZE 
)

And to check your images and file path returned

image,file_path = next(test_generator)
# print(file_path)
# plt.imshow(image)

26 Keras flowFromDirectory get file names as they are being generated Jan 18 '17 19 Using Hover and Pressed stylesheet Qt Oct 3 '13 17 Android SDK Build Tool Multiple Versions Aug 4 '14

Just another comment, using keras 2.0.6 in a kaggle competition I see the same issue with the predict order. If I use the code below to generate predictions, I get correct predictions the first time I call it, and apparently shuffled predictions on the second call:

It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last". validation_split: Float. Fraction of images reserved for validation (strictly between 0 and 1). dtype: Dtype to use for the generated arrays. Examples. Example of using .flow(x, y):

This allows you to optionally specify a directory to which to save the augmented pictures being generated (useful for visualizing what you are doing). save_prefix: str (default: ''). Prefix to use for filenames of saved pictures (only relevant if save_to_dir is set). save_format: one of "png", "jpeg" (only relevant if save_to_dir is set

Comments
  • Using only default Keras - it's not possible. But you can change a Keras code in order to do that.
  • Have you read my answer?
  • It has to be noted, that this does not work if shuffle is True (default). You will always get the filenames in the order they are first processed, not neccesarily in the order they are returned from the generator.
  • @AlexGuth what should one do when using shuffle=True?
  • Generator call on last batch resets batch_index to 0. So you'll get idx = -1, which filters out last batch completely.
  • Excellent answer. A couple small suggestions: the example classname doesn't match, should be "ImageWithNames". The example might also include subset="validation", shuffle=False in case it's not clear to people those should go here. Lastly, for those using keras from tensorflow the import would be from tensorflow.keras.preprocessing.... And maybe for the check data_batch, filenames = next(datagen), in case it's not super obvious.
  • This one is the right (or more pythonic) way of doing, IMO. Thanks!