Hot questions for Using Neural networks in model fitting


I'm trying to automatically determine when a Keras autoencoder converges. For example, look at this link under "Let's build the simplest autoencoder possible." The number of epochs is hardcoded at 50 (when the loss value converges). However, how would you code this using Keras if you didn't know the number was 50? Would you just keep calling fit()?


This question is actually ridiculously wide and hard. There are many techniques on how to set the number of epochs:

  • Early stopping- in this case you set the number of epochs to a really high number and you turn off the training when the improvement over next epochs is not satisfying. In Keras you have a special object called EarlyStopping which does the job for you.
  • Model Checkpoint - here you once again set up a really high number of epochs and you simply save only the best model w.r.t. to a metric chosen. Once again you have a special callback for this scenario.

Of course, there are other scenarios like e.g. using Reinforcement learning to find the stopping time or more complexed scenarios when you choose this in a Bayesian hyperparameter set up but those are much harder methods which are often not introducing any improvement.

One sure thing is that restarting a fit method might end up in unexpected behaviour as many inner states of a model are reset which could cause instability. For this scenario I strongly advise you to use train_on_batch which is not resetting model states and makes a lot of fancy training scenarios possible.


I was wondering if the fit_generator() in keras has any advantage in respect to memory usage over using the usual fit() method with the same batch_size as the generator yields. I've seen some examples similar to this:

def generator():
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# some data prep
while 1:
    for i in range(1875): # 1875 * 32 = 60000 -> # of training samples
        yield X_train[i*32:(i+1)*32], y_train[i*32:(i+1)*32]

If I pass this into the fit_generator() method or just pass all the data directly into the fit() method and define a batch_size of 32, would it make any difference regarding (GPU?)-memory whatsoever?


Yes the difference actually comes in when you need augmented data for better model accuracy.

For efficiency it allows realtime data augmentation on images with CPU. That means it can use the GPU for your model training and it updates, while delegating to the CPU the load of augmenting images and providing the batches to train.


I have trained and tested a Feed Forward Neural Network using Keras in Python with a dataset. But each time, in order to recognize a new test set with external data (external since the data are not included within the dataset), I have to re-train the Feed Forward Neural Network to compute the test set. For instance each time I have to do: (data, output_data)
 print "Prediction : " prediction

Obtaining correct output:

  Prediction: [1 2 3 4 5 1 2 3 1 2 3]
  Acc: 100%

Now I would test a new test set, namely "new_test2.csv" without re-training again, just using what the network has learned. I am also thinking about a sort of real time recognition.

How I should do that?

Thanks in advance


With a well trained model you can make predictions on any new data. You don´t have to retrain anything because (hopefully) your model can generalize it´s learning to unseen data and will achieve comparable accuracy.

Just feed in the data from "new_test2.csv" to your predict function:


Obviously you need data of the same type and classes. In addition to that you need to apply any transformations to the new data in the same way you may have transformed the data you trained your model on.

If you want realtime predictions you could setup an API with Flask:

Regarding terminology and correct method of training:

You train on a training set (e.g. 70% of all the data you have).

You validate your training with a validation set (e.g. 15% of your data). You use the accuracy and loss values from your training to tune your hyperparameters.

You then evaluate your models final performance by predicting data from your test set (again 15% of your data). That has to be data, your network hasn´t seen before at all and hasn´t been used by you to optimize training parameters.

After that you can predict on production data.

If you want to save your trained model use this (taken from Keras documentation):

from keras.models import load_model'my_model.h5')  # creates a HDF5 file 'my_model.h5'
del model  # deletes the existing model

# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')


I've built a simplistic multi-layer NN using Keras with precipitation data in Australia. The code takes 4 input columns: ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed'] and trains against the RainTomorrow output.

I've partitioned the data into training/test buckets, transformed all values into 0 <= n <= 1. When I trying to run, my loss values steady at ~13.2, but my accuracy is always 0.0. An example of logged fitting intervals are:

Epoch 37/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.1274 - acc: 0.0000e+00 - val_loss: -16.1168 - val_acc: 0.0000e+00
Epoch 38/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.1457 - acc: 0.0000e+00 - val_loss: -16.1168 - val_acc: 0.0000e+00
Epoch 39/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.1315 - acc: 0.0000e+00 - val_loss: -16.1168 - val_acc: 0.0000e+00
Epoch 40/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.1797 - acc: 0.0000e+00 - val_loss: -16.1168 - val_acc: 0.0000e+00
Epoch 41/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.1844 - acc: 0.0000e+00 - val_loss: -16.1169 - val_acc: 0.0000e+00
Epoch 42/200
113754/113754 [==============================] - 0s 2us/step - loss: -13.2205 - acc: 0.0000e+00 - val_loss: -16.1169 - val_acc: 0.0000e+00
Epoch 43/200

How can I amend the following script, so my accuracy grows, and my predication output returns a value between 0 and 1 (0: no rain, 1: rain)?

import keras
import sklearn.model_selection
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler

labelencoder = LabelEncoder()

# read data, replace NaN with 0.0
csv_data = pd.read_csv('weatherAUS.csv', header=0)
csv_data = csv_data.replace(np.nan, 0.0, regex=True)

# Input/output columns scaled to 0<=n<=1
x = csv_data.loc[:, ['MinTemp', 'MaxTemp', 'Rainfall', 'WindGustSpeed']]
y = labelencoder.fit_transform(csv_data['RainTomorrow'])
scaler_x = MinMaxScaler(feature_range =(-1, 1))
x = scaler_x.fit_transform(x)
scaler_y = MinMaxScaler(feature_range =(-1, 1))
y = scaler_y.fit_transform([y])[0]

# Partitioned data for training/testing
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.2)

# model
model = keras.models.Sequential() 
model.add( keras.layers.normalization.BatchNormalization(input_shape=tuple([x_train.shape[1]])))
model.add(keras.layers.core.Dense(4, activation='relu'))
model.add(keras.layers.core.Dense(4, activation='relu'))
model.add(keras.layers.core.Dense(4, activation='relu'))
model.add(keras.layers.core.Dense(1,   activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=["accuracy"])

callback_early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='auto'), y_train, batch_size=1024, epochs=200, validation_data=(x_test, y_test), verbose=1, callbacks=[callback_early_stopping])

y_test = model.predict(x_test.values)


As you can see, the sigmoid activation function that you are using in your neural network output (the last layer) range from 0 to 1.

Note that your label (y) is rescaled to -1 to 1.

I suggest you change the y range to 0 to 1 and keep the sigmoid output.


I'm building a simple neural network using keras.

Each element of the training data has 100 dimensions, and I'm reading the labels of the elements from a text file.

f = open('maleE', "rt")
labelsTrain = [line.rstrip() for line in f.readlines()]

The labels are strings that have this structure: number_text

To fit the model on the training data:, labelsTrain, epochs= 20000, batch_size= 1350)

And I get the following error:

File "", line 112, in <module>, labelsTrain, epochs=20000, batch_size=1350)
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/keras/", line 867, in fit
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/keras/engine/", line 1598, in fit
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/keras/engine/", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/keras/backend/", line 2273, in __call__
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/tensorflow/python/client/", line 889, in run
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/tensorflow/python/client/", line 1087, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "/Users/renzo/PyEnvironments/tensorKeras/lib/python2.7/site-packages/numpy/core/", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: invalid literal for float(): 225_sokode

The label is the element 279 from a list of 378 labels.


First of all, pick a unique name for each of your classes. I say this because I don't get what is the number in your class labels (if it is not same for each class, use str.split() to just keep the text). Then you should encode your string labels. For example, see this post for One-hot encoding of labels.