Callbackfunction modelcheckpoint causes error in keras

keras modelcheckpoint not saving
keras fit_generator modelcheckpoint
keras tensorboard
checkpointing with keras
using checkpoint in keras
create model checkpoint
how to resume training in keras

I seem to get this error when I am using the callback function modelcheckpoint..

I read from a github issue that the solution would be make use of model.get_weight, but I am implicitly only storing that since i am only storing the one with best weight.

Keras only seem to save weights using h5, which make me question is there any other way to do store them using the eras API, if so how? If not, how do i store it?

Made an example to recreate the problem:


import glob, os
import sys
from os import listdir
from os.path import isfile, join
import numpy as np
import warnings
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from keras.utils import np_utils
from keras import metrics
import keras
from keras import backend as K
from keras.models import Sequential
from keras.optimizers import SGD, Adam
from keras.layers.core import Dense, Activation, Lambda, Reshape,Flatten
from keras.layers import Conv1D,Conv2D,MaxPooling2D, MaxPooling1D, Reshape
#from keras.utils.visualize_util import plot
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers.merge import Concatenate, Add
import h5py
import random
import tensorflow as tf
import math
from keras.callbacks import CSVLogger
from keras.callbacks import ModelCheckpoint

if len(sys.argv) < 5:
    print "Missing Arguments!"
    print "python <workspace> <totale_frames> <fbank-dim> <window-height> <batch_size>"
    print "Example:"
    print "python deltas 15 40 5 100"

total_frames = int(sys.argv[2])
total_frames_with_deltas = total_frames*3
dim = int(sys.argv[3])
window_height = int(sys.argv[4])
inserted_batch_size = int(sys.argv[5])
stride = 1
splits = ((dim - window_height)+1)/stride

#input_train_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_input"
#output_train_data ="/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_train_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(inserted_batch_size)+"_fws_output"
#input_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_input"
#output_test_data = "/media/carl/E2302E68302E443F/"+str(sys.argv[1])+"/fbank/org_test_total_frames_"+str(total_frames)+"_dim_"+str(dim)+"_winheig_"+str(window_height)+"_batch_"+str(1)+"_fws_output"

#train_files =[f for f in listdir(input_train_data) if isfile(join(input_train_data, f))]
#test_files =[f for f in listdir(input_test_data) if isfile(join(input_test_data, f))]

#print len(train_files)
print "hallo"
def train_generator():
    while True:
#        input = random.choice(train_files)
#        h5f = h5py.File(input_train_data+'/'+input, 'r')
#        train_input = h5f['train_input'][:]
#        train_output = h5f['train_output'][:]
#        h5f.close()
        train_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
        train_list_list = []
        train_input = train_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
        train_input_list = np.split(train_input,splits*total_frames_with_deltas,axis=1)
        for i in range(len(train_input_list)):
            train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,window_height,3)

        #for i in range(len(train_input_list)):
        #    train_input_list[i] = train_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)

        train_output = np.random.randint(5, size = (1,total_frames,5))
        middle = int(math.ceil(total_frames/2))

        train_output = train_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))
        #print train_output.shape
        #print len(train_input_list)
        #print train_input_list[0].shape
        yield (train_input_list, train_output)
print "hallo"
def test_generator():
    while True:
#        input = random.choice(test_files)
#        h5f = h5py.File(input_test_data+'/'+input, 'r')
#        test_input = h5f['test_input'][:]
#        test_output = h5f['test_output'][:]
#        h5f.close()
        test_input = np.random.randint(100,size=((inserted_batch_size,splits*total_frames_with_deltas,window_height,3)))
        test_input = test_input.reshape((inserted_batch_size,splits*total_frames_with_deltas,window_height,3))
        test_input_list = np.split(test_input,splits*total_frames_with_deltas,axis=1)
        #test_input_list = np.split(test_input,45,axis=3)

        for i in range(len(test_input_list)):
            test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,window_height,3)

        #for i in range(len(test_input_list)):
        #    test_input_list[i] = test_input_list[i].reshape(inserted_batch_size,33,window_height,1,3)

        test_output = np.random.randint(5, size = (1,total_frames,5))

        middle = int(math.ceil(total_frames/2))

        test_output = test_output[:,middle:middle+1,:].reshape((inserted_batch_size,1,5))

        yield (test_input_list, test_output)
print "hallo"

def fws():
    #print "Inside"
    #   Params:
    #   batch ,  lr, decay , momentum, epochs
    #Input shape: (batch_size,40,45,3)
    #output shape: (1,15,50)
    # number of unit in conv_feature_map = splitd
    model_output = []
    list_of_input = [Input(shape=(8,3)) for i in range(splits*total_frames_with_deltas)]
    output = []

    skip = total_frames_with_deltas
    for steps in range(total_frames_with_deltas):
        conv = Conv1D(filters = 100, kernel_size = 8)
        column = 0
        for  _ in range(splits):
            #print "column " + str(column) + "steps: " + str(steps)
            column = column + 1

    #print len(output)
    #print splits*total_frames_with_deltas

    conv = []
    for section in range(splits):
        column = 0
        skip = splits
        temp = []
        for _ in range(total_frames_with_deltas):
            column = column + 1
        #print len(conv)

    output_conc = Concatenate()(conv)
    #print output_conc.get_shape
    output_conv = Reshape((splits, -1))(output_conc)
    #print output_conv.get_shape

    pooled = MaxPooling1D(pool_size = 6, strides = 2)(output_conv)
    reshape = Reshape((1,-1))(pooled)

    dense1 = Dense(units = 1024, activation = 'relu',    name = "dense_1")(reshape)
    #dense2 = Dense(units = 1024, activation = 'relu',    name = "dense_2")(dense1)
    dense3 = Dense(units = 1024, activation = 'relu',    name = "dense_3")(dense1)
    final = Dense(units = 5, activation = 'relu',    name = "final")(dense3)

    model = Model(inputs = list_of_input , outputs = final)
    sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)
    model.compile(loss="categorical_crossentropy", optimizer=sgd , metrics = ['accuracy'])
    print "compiled"

    model_yaml = model.to_yaml()
    with open("model.yaml", "w") as yaml_file:

    print "Model saved!"

    log= CSVLogger('/home/carl/kaldi-trunk/dnn/experimental/yesno_cnn_50_training_total_frames_'+str(total_frames)+"_dim_"+str(dim)+"_window_height_"+str(window_height)+".csv")
    checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_weights_only=True, mode='max')

    print "log"
    #plot_model(model, to_file='model.png')
    print "Fit"
    hist_current = model.fit_generator(train_generator(),
                        epochs = 10000,
                        verbose = 1,
                        validation_data = test_generator(),
                        pickle_safe = True,
                        workers = 4,
                        callbacks = [log,checkpoint])


Execute the script by: python yens 50 40 8 1

which give me a full traceback:

full traceback Error:

carl@ca-ThinkPad-T420s:~/Dropbox$ python yesno 50 40 8 1
Using TensorFlow backend.
Couldn't import dot_parser, loading of dot files will not be possible.
Model saved!
/usr/local/lib/python2.7/dist-packages/keras/backend/ UserWarning: Expected no kwargs, you passed 1
kwargs passed to function are ignored with Tensorflow backend
Epoch 1/10000
2017-05-26 13:01:45.851125: W tensorflow/core/platform/] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851345: W tensorflow/core/platform/] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-26 13:01:45.851392: W tensorflow/core/platform/] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
443/444 [============================>.] - ETA: 4s - loss: 100.1266 - acc: 0.3138Epoch 00000: saving model to yesno_cnn_50_training_total_frames_50_dim_40_window_height_8weights-improvement-00-0.48.hdf5
Traceback (most recent call last):
  File "", line 205, in <module>

  File "", line 203, in fws

  File "/usr/local/lib/python2.7/dist-packages/keras/legacy/", line 88, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 1933, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/usr/local/lib/python2.7/dist-packages/keras/", line 77, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/usr/local/lib/python2.7/dist-packages/keras/", line 411, in on_epoch_end
    self.model.save_weights(filepath, overwrite=True)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 2503, in save_weights
    save_weights_to_hdf5_group(f, self.layers)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 2746, in save_weights_to_hdf5_group
    f.attrs['layer_names'] = ['utf8') for layer in layers]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/", line 93, in __setitem__
    self.create(name, data=value, dtype=base.guess_dtype(value))
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/", line 183, in create
    attr = h5a.create(self._id, self._e(tempname), htype, space)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
  File "h5py/h5a.pyx", line 47, in h5py.h5a.create (/tmp/pip-4rPeHA-build/h5py/h5a.c:1904)
RuntimeError: Unable to create attribute (Object header message is too large)

If you look at the amount of data Keras is trying to save under layer_names attribute (inside the output HDF5 file being create), you will find that it takes more than 64kB.

np.asarray(['utf8') for layer in model.layers]).nbytes
>> 77100

I quote from

Is there an object header limit and how does that affect HDF5 ?

There is a limit (in HDF5-1.8) of the object header, which is 64 KB. The datatype for a dataset is stored in the object header, so there is therefore a limit on the size of the datatype that you can have. (See HDFFV-1089)

The code above was (almost entirely) copied from the traceback:

File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 2746, in save_weights_to_hdf5_group
f.attrs['layer_names'] = ['utf8') for layer in layers]

I am using numpy asarray method to get the figure fast but h5py gets similar figure (I guess), see if you want to find exact figure.

Anyway, either you will need to implement your own methods for saving/loading of the weights (or use existing workarounds), or you need to give a really short name to ALL the layers inside your model :), something like this:

list_of_input = [Input(shape=(8,3), name=('i%x' % i)) for i in range(splits*total_frames_with_deltas)]
conv = Conv1D(filters = 100, kernel_size = 8, name='cv%x' % steps) 
conv.append(Add(name='add%x' % section)(temp))
output_conc = Concatenate(name='ct')(conv)
output_conv = Reshape((splits, -1), name='rs1')(output_conc)
pooled = MaxPooling1D(pool_size = 6, strides = 2, name='pl')(output_conv)
reshape = Reshape((1,-1), name='rs2')(pooled) 
dense1 = Dense(units = 1024, activation = 'relu', name = "d1")(reshape) 
dense2 = Dense(units
= 1024, activation = 'relu', name = "d2")(dense1) 
dense3 = Dense(units = 1024, activation = 'relu', name = "d3")(dense1) 
final = Dense(units = 5, activation = 'relu', name = "fl")(dense3)

You mustn't forget to name all the layers because the (numpy) string array into which the layer names are converted is using the size of the longest string for each individual string in it when it is saved!

After renaming the layers as proposed above (which takes almost 26kB) the model is saved successfully. Hope this elaborate answer helps someone.

Update: I have just made a PR to Keras which should fix the issue without implementing any custom loading/saving methods, see 7508

ModelCheckpoint callback with multi_gpu fails to save the model , #Adding Data Augmentation Provided by Keras Module Okay, so I gave it a try with the callback function I gave in #8123 and a total of four nbansal90 changed the title Using ModelCheckpoint callback with fit_generator multi_gpu fails to save the model, throws error after 1st epoch ModelCheckpoint  However, when I do this I get the following error: "AttributeError: 'ModelCheckpoint' object has no attribute '_implements_train_batch_hooks'". The closest thing I have found online for my problem is this post with a similar error, where the problem came from mixing modules from keras and tf.keras , however this is not my case as all my modules

A simple solution, albeit possibly not the most elegant, could be to run a while loop with epochs = 1.

  1. Get the weights at the end of every epoch together with the accuracy and the loss
  2. Save the weights to file 1 with model.get_weight
  3. if accuracy is greater than at the previous epoch (i.e. loop), store the weights to a different file (file 2)
  4. Run the loop again loading the weights from file 1
  5. Break the loops setting a manual early stopping so that it breaks if the loss does not improve for a certain number of loops

ModelCheckpoint callback error · Issue #27909 · tensorflow , keras, when I fit the model with train dataset and validation dataset created from the tf.dataset, and use ModelCheckpoint with default policy, an  Code: model.compile(optimizer = optimizer , loss = 'mean squared error', metrics = ['acc']) model.summary() checkpoint = ModelCheckpoint('model_weight.h5' , monitor

You can use get_weights() together with

It's not the best solution, because it will save several files, but it actually works.

The problem is that you won't have the "optimizer" saved with the current states. But you can perhaps work around that by using smaller learning rates after loading.

Custom callback using

def myCallback(epoch,logs):
    global storedLoss
    #do your comparisons here using the "logs" var.

    if (logs['loss'] < storedLoss):

        storedLoss = logs['loss']
        for i in range(len(model.layers)):

            WandB = model.layers[i].get_weights()

            if len (WandB) > 0: #necessary because some layers have no weights

      "W" + "-" + str(i), WandB[0],False) 
      "B" + "-" + str(i), WandB[1],False)

    #remember that get and set weights use a list: [weights,biases]   
    #it may happen (not sure) that there is no bias, and thus you may have to check it (len(WandB)==1).   

The logs var brings a dictionary with named metrics, such as "loss", and "accuracy", if you used it.

You can store the losses within the callback in a global var, and compare if each loss is better or worse than the last.

When fitting, use the lambda callback:

from keras.callbacks import LambdaCallback,callbacks=[LambdaCallback(on_epoch_end=myCallback)])   

In the example above, I used the LambdaCallback, which has more possibilities than just on_epoch_end.

For loading, do a similar loop:

#you have to create the model first and then set the layers
def loadModel(model):
    for i in range(len(model.layers)):
        WandBForCheck = model.layers[i].get_weights() 

        if len (WandBForCheck) > 0: #necessary because some layers have no weights
            W = np.load(Wfile + str(i))   
            B = np.load(Bfile + str(i))

Callbacks API, point the network will start “memorizing” the training data, causing the training error to In Keras, we can implement early stopping as a callback function. Therefore, optionally, we can include a second operation, ModelCheckpoint, which  The following are code examples for showing how to use keras.callbacks.ModelCheckpoint().They are from open source Python projects. You can vote up the examples you like or vote down the ones you don't like.

See follow-up at and

I saw the YAML and the root cause is probably that you have so many Inputs. A few Inputs with many dimensions is preferred to many Inputs, especially if you can use scanning and batch operations to do everything efficiently.

Now, ignoring that entirely, here is how you can save and load your model if it has too much stuff to save as JSON efficiently:

You can pass save_weights_only=True. That won't save optimizer weights, so isn't a great solution.

Just put together a PR for saving model weights and optimizer weights but not configuration. When you want to load, first instantiate and compile the model as you did when you were going to train it, then use load_all_weights to load the model and optimizer weights into that model. I'll try to merge it soon so you can use it from the master branch.

You could use it something like this:

from keras.callbacks import LambdaCallback
from keras_contrib.utils.save_load_utils import save_all_weights, load_all_weights
# do some stuff to create and compile model
# use `save_all_weights` as a callback to checkpoint your model and optimizer weights, callbacks=[LambdaCallback(on_epoch_end=lambda epoch, logs: save_all_weights(model, "checkpoint-{:05d}.h5".format(epoch))])
# use `load_all_weights` to load model and optimizer weights into an existing model
# if not compiled (no `model.optimizer`), this will just load model weights
load_all_weights(model, 'checkpoint-1337.h5')

So I don't endorse the model, but if you want to get it to save and load anyways this should probably work for you.

As a side note, if you want to save weights in a different format, something like this would work.

pickle.dump([K.get_value(w) for w in model.weights], open( "save.p", "wb" ) )


Machine Learning with Python Cookbook: Practical Solutions from , @fchollet Using Multiple callbacks in multi_gpu scenario throws an error. A single callback works perfectly for multi_gpu case, but multiple causes issue. Okay, so I gave it a try with the callback function I gave in #8123 and a total of four callbacks, self.callback.on_train_end(logs=logs) lr_scheduler = keras.​callbacks. @fchollet Dear Prof Chollet, Im a new user of Keras. I have a small question. In 1st training process, I set: nb_epoch = 5, and use ModelCheckpoint to save my best model to 'best_model.hdf5'.

Your model architecture must be too large to be saved.

USE get_weights AND set_weights TO SAVE AND LOAD MODEL, RESPECTIVELY. Do not use callback model checkpoint. just once the training ends, save its weights with pickle.

Have a look at this link: Unable to save DataFrame to HDF5 ("object header message is too large")

Developers, 'Callbackfunction modelcheckpoint causes error in keras', 'How to make Keras use Tensorflow backend in Anaconda?', 'How to calculate the  Callback to save the Keras model or model weights at some frequency

Keras OwnerID, Keras model training error: AttributeError: 'list' object has no attribute file path is : ', filepath) checkpoint = ModelCheckpoint(filepath, monitor='val_loss', This is because there is only one callback function in the previous code, so a list is MySQL reports an error. as you can see Cause The type of the imported data val. Deep learning models can take hours, days or even weeks to train. If the run is stopped unexpectedly, you can lose a lot of work. In this post you will discover how you can check-point your deep learning models during training in Python using the Keras library. Let’s get started. Update Mar/2017: Updated for Keras […]

Keras model training error: AttributeError: 'list' object has no attribute , how the callback function be triggered in the keras.callbacks.Callback class? Callbackfunction modelcheckpoint causes error in keras · Vue.js 2 and auth0  ModelCheckpoint. Among the many functions, we have the ModelCheckpoint, it’ll help us save our model for each epoch, so we can put our model to train and do not worry about possible issues that

callback - Python, from keras.callbacks import ModelCheckpoint Hi Nasarudin, sorry I am not sure of the cause of this error. I believe you cannot use callbacks  Callbacks API. A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc).