Hot questions for Using Neural networks in mean square error


Although both of the above methods provide a better score for the better closeness of prediction, still cross-entropy is preferred. Is it in every case or there are some peculiar scenarios where we prefer cross-entropy over MSE?


Cross-entropy is prefered for classification, while mean squared error is one of the best choices for regression. This comes directly from the statement of the problems itself - in classification you work with very particular set of possible output values thus MSE is badly defined (as it does not have this kind of knowledge thus penalizes errors in incompatible way). To better understand the phenomena it is good to follow and understand the relations between

  1. cross entropy
  2. logistic regression (binary cross entropy)
  3. linear regression (MSE)

You will notice that both can be seen as a maximum likelihood estimators, simply with different assumptions about the dependent variable.


I'm trying to optimise and validate a neural network using Netlab on Matlab

I'd like to find the error value for each iteration, so I can see convergence on a plot. This can be done by storing the errors presented in the command window which is done by setting options(1) to 1 using errlog is a netopt output.

However these errors are not the same as mlperr which gives an error value of 0.5*(sum of squares error) for the last iteration. I can't really validly use them if I don't know how they're calculated.

Does anybody know what the errors displayed in the command window represent (I'm using scaled conjugate gradient as my optimisation algorithm)?

Is there a way of storing the mlperr for each iteration that the network runs?

Any help is greatly appreciated, many thanks!

NB: I have tried doing something similar to this :

However it gives different results to running the network with the number of iterations specified under options(14) rather than k for some reason.


Yes certainly,

The ERRLOG vector, created as an output to the network optimisation function netopt with the following syntax


Each row of ERRLOG gives 0.5*SSE (sum of squares error) for the corresponding iteration of network optimisation. This error is calculated between the predicted outputs (y) and the target outputs (t).

The MLPERR function, hast the following syntax

E = mlperr(NET, X, T)

It also gives 0.5*SSE between predicted outputs (y) and target outputs (t), but as the network parameters are constant (NET should be pre-trained), E is a singular value.

If netopt was run with an ERRLOG output, and then MLPERR was run with the same network and variables, E should be the same value as value of the final row of ERRLOG (the error after the final iteration of network optimisation).

Hope this is of some use to someone!


I'm using Keras and I'm trying to build a Neural Network to predict the interest rate of given data. The data looks like this:

    loan_amnt   annual_inc  emp_length  int_rate
    10000    38000.0         5.600882          12.40
    13750    17808.0         5.600882          28.80
    26100    68000.0         10.000000         20.00
    13000    30000.0         1.000000          20.00
    7000     79950.0         7.000000          7.02

The features (X) are loan_amnt , annual_inc , and emp_length. The target (y) is int_rate.

Here's my process and what I've done after normalizing the data:

      #Building out model
     model = Sequential([
     Dense(9, activation='relu', input_shape=(3,)),
     Dense(3, activation='relu'),
     Dense(1, activation='linear'),

      #Compiling model

       hist =, Y_train,
          batch_size=100, epochs=20, verbose=1)

Here's an output sample after running

    Epoch 1/20
    693/693 [==============================] - 1s 905us/step - loss: 96.2391 - mean_squared_error: 
    Epoch 2/20
    693/693 [==============================] - 0s 21us/step - loss: 95.2362 - mean_squared_error: 
    Epoch 3/20
    693/693 [==============================] - 0s 20us/step - loss: 94.4133 - mean_squared_error: 

Finally, evaluating the model model.evaluate(X_train, Y_train) and got the following output:

      693/693 [==============================] - 0s 372us/step
      [77.88501817667468, 132.0109032635049]

The question is, how can I know if my model is doing well or not, and how can I read the numbers?


You are using a variant of the MSE loss which is defined as :

MSE = mean((y_true - y_pred)^2)

So when you have 132. as a MSE metrics, then you really have a mean of sqrt(132.)~= 11,5 mean difference between the y_true and y_pred. Which is quite a bit on your data as it is shown on the MSPE loss, you're having ~78% error on your data.

In example if the y_true was 20, you could either predict 36 or 4. Something like that.

You could say that your error is good when MSPE is at 10%. Depends on your case


I am new to neural network so please pardon any silly question. I am working with a weather dataset. Here I am using Dewpoint, Humidity, WindDirection, WindSpeed to predict temperature. I have read several papers on this so I felt intrigued to do a research on my own.At first I am training the model with 4000 observations and then trying to predict next 50 temperature points.

Here goes my entire code.

from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing
import numpy as np
import pandas as pd

df = pd.read_csv('WeatherData.csv', sep=',', index_col=0)

X = np.array(df[['DewPoint', 'Humidity', 'WindDirection', 'WindSpeed']])
y = np.array(df[['Temperature']])

# nan_array = pd.isnull(df).any(1).nonzero()[0]

neural_net = MLPRegressor(
# Scaling the data
max_min_scaler = preprocessing.MinMaxScaler()
X_scaled = max_min_scaler.fit_transform(X)
y_scaled = max_min_scaler.fit_transform(y)[0:4001], y_scaled[0:4001].ravel())

predicted = neural_net.predict(X_scaled[5001:5051])

# Scale back to actual scale
max_min_scaler = preprocessing.MinMaxScaler(feature_range=(y[5001:5051].min(), y[5001:5051].max()))
predicted_scaled = max_min_scaler.fit_transform(predicted.reshape(-1, 1))

print("Root Mean Square Error ", mean_squared_error(y[5001:5051], predicted_scaled))

First confusing thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Run 1:

Iteration 1, loss = 0.01046558
Iteration 2, loss = 0.00888995
Iteration 3, loss = 0.01226633
Iteration 4, loss = 0.01148097
Iteration 5, loss = 0.01047128
Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping.
Root Mean Square Error  22.8201171703

Run 2(Significant Improvement):

Iteration 1, loss = 0.03108813
Iteration 2, loss = 0.00776097
Iteration 3, loss = 0.01084675
Iteration 4, loss = 0.01023382
Iteration 5, loss = 0.00937209
Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping.
Root Mean Square Error  2.29407183124

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error. What am I missing here?

Please help!


First confusing thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Neural networks are prone to local optima. There is never a guarantee you will learn anything decent, nor (as a consequence) that multiple runs lead to the same solution. Learning process is heavily random, depends on the initialization, sampling order etc. thus this kind of behaviour is expected.

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error.

Neural networks in sklearn are extremely basic, and they do not provide this kind of flexibility. If you need to work with more complex settings you simply need more NN oriented library, like Keras, TF etc. scikit-learn community struggled a lot to even make this NN implementation "in", and it does not seem like they are going to add much more flexibility in near future.

As a minor thing - use of "minmaxscaler" seem slightly odd. You should not "fit_transform" each time, you should fit only once, and later on - use transform (or inverse_transform). In particular, it should be

y_max_min_scaler = preprocessing.MinMaxScaler()
y_scaled = y_max_min_scaler.fit_transform(y)


predicted_scaled = y_max_min_scaler.inverse_transform(predicted.reshape(-1, 1))