## Hot questions for Using Neural networks in mean square error

Question:

Although both of the above methods provide a better score for the better closeness of prediction, still cross-entropy is preferred. Is it in every case or there are some peculiar scenarios where we prefer cross-entropy over MSE?

Answer:

Cross-entropy is prefered for **classification**, while mean squared error is one of the best choices for **regression**. This comes directly from the statement of the problems itself - in classification you work with very particular set of possible output values thus MSE is badly defined (as it does not have this kind of knowledge thus penalizes errors in incompatible way). To better understand the phenomena it is good to follow and understand the relations between

- cross entropy
- logistic regression (binary cross entropy)
- linear regression (MSE)

You will notice that both can be seen as a maximum likelihood estimators, simply with different assumptions about the dependent variable.

Question:

I'm trying to optimise and validate a neural network using Netlab on Matlab

I'd like to find the error value for each iteration, so I can see convergence on a plot. This can be done by storing the errors presented in the command window which is done by setting **options(1)** to **1** using **errlog** is a **netopt** output.

However these errors are not the same as **mlperr** which gives an error value of **0.5*(sum of squares error)** for the last iteration. I can't really validly use them if I don't know how they're calculated.

Does anybody know what the errors displayed in the command window represent (I'm using **scaled conjugate gradient** as my optimisation algorithm)?

Is there a way of storing the **mlperr** for each iteration that the network
runs?

Any help is greatly appreciated, many thanks!

**NB:**
I have tried doing something similar to this :
ftp://ftp.dcs.shef.ac.uk/home/spc/com336/neural-lab-wk6.html

However it gives different results to running the network with the number of iterations specified under options(14) rather than k for some reason.

Answer:

Yes certainly,

The *ERRLOG* vector, created as an output to the network optimisation function *netopt* with the following syntax

[NET, OPTIONS, ERRLOG] = netopt(NET, OPTIONS, X, T, ALG)

Each row of *ERRLOG* gives 0.5*SSE (sum of squares error) for the corresponding iteration of network optimisation. This error is calculated between the predicted outputs (y) and the target outputs (t).

The *MLPERR* function, hast the following syntax

E = mlperr(NET, X, T)

It also gives 0.5*SSE between predicted outputs (y) and target outputs (t), but as the network parameters are constant (NET should be pre-trained), *E* is a singular value.

If *netopt* was run with an *ERRLOG* output, and then *MLPERR* was run with the same network and variables, *E* should be the same value as value of the final row of *ERRLOG* (the error after the final iteration of network optimisation).

Hope this is of some use to someone!

Question:

I'm using Keras and I'm trying to build a Neural Network to predict the interest rate of given data. The data looks like this:

loan_amnt annual_inc emp_length int_rate 10000 38000.0 5.600882 12.40 13750 17808.0 5.600882 28.80 26100 68000.0 10.000000 20.00 13000 30000.0 1.000000 20.00 7000 79950.0 7.000000 7.02

The features (X) are `loan_amnt`

, `annual_inc`

, and `emp_length`

. The target (y) is `int_rate`

.

Here's my process and what I've done after normalizing the data:

#Building out model model = Sequential([ Dense(9, activation='relu', input_shape=(3,)), Dense(3, activation='relu'), Dense(1, activation='linear'), ]) #Compiling model model.compile(loss='mean_absolute_percentage_error', metrics=['mse'], optimizer='RMSprop') hist = model.fit(X_train, Y_train, batch_size=100, epochs=20, verbose=1)

Here's an output sample after running `model.fit()`

:

Epoch 1/20 693/693 [==============================] - 1s 905us/step - loss: 96.2391 - mean_squared_error: 179.8007 Epoch 2/20 693/693 [==============================] - 0s 21us/step - loss: 95.2362 - mean_squared_error: 176.9865 Epoch 3/20 693/693 [==============================] - 0s 20us/step - loss: 94.4133 - mean_squared_error: 174.6367

Finally, evaluating the model `model.evaluate(X_train, Y_train)`

and got the following output:

693/693 [==============================] - 0s 372us/step [77.88501817667468, 132.0109032635049]

The question is, how can I know if my model is doing well or not, and how can I read the numbers?

Answer:

You are using a variant of the `MSE`

loss which is defined as :

`MSE = mean((y_true - y_pred)^2)`

So when you have `132.`

as a MSE metrics, then you really have a mean of `sqrt(132.)`

~= 11,5 mean difference between the y_true and y_pred. Which is quite a bit on your data as it is shown on the `MSPE`

loss, you're having ~78% error on your data.

In example if the y_true was 20, you could either predict 36 or 4. Something like that.

You could say that your error is good when MSPE is at 10%. Depends on your case

Question:

I am new to neural network so please pardon any silly question. I am working with a weather dataset. Here I am using Dewpoint, Humidity, WindDirection, WindSpeed to predict temperature. I have read several papers on this so I felt intrigued to do a research on my own.At first I am training the model with 4000 observations and then trying to predict next 50 temperature points.

Here goes my entire code.

from sklearn.neural_network import MLPRegressor from sklearn.metrics import mean_squared_error from sklearn import preprocessing import numpy as np import pandas as pd df = pd.read_csv('WeatherData.csv', sep=',', index_col=0) X = np.array(df[['DewPoint', 'Humidity', 'WindDirection', 'WindSpeed']]) y = np.array(df[['Temperature']]) # nan_array = pd.isnull(df).any(1).nonzero()[0] neural_net = MLPRegressor( activation='logistic', learning_rate_init=0.001, solver='sgd', learning_rate='invscaling', hidden_layer_sizes=(200,), verbose=True, max_iter=2000, tol=1e-6 ) # Scaling the data max_min_scaler = preprocessing.MinMaxScaler() X_scaled = max_min_scaler.fit_transform(X) y_scaled = max_min_scaler.fit_transform(y) neural_net.fit(X_scaled[0:4001], y_scaled[0:4001].ravel()) predicted = neural_net.predict(X_scaled[5001:5051]) # Scale back to actual scale max_min_scaler = preprocessing.MinMaxScaler(feature_range=(y[5001:5051].min(), y[5001:5051].max())) predicted_scaled = max_min_scaler.fit_transform(predicted.reshape(-1, 1)) print("Root Mean Square Error ", mean_squared_error(y[5001:5051], predicted_scaled))

First **confusing** thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Run 1:

Iteration 1, loss = 0.01046558 Iteration 2, loss = 0.00888995 Iteration 3, loss = 0.01226633 Iteration 4, loss = 0.01148097 Iteration 5, loss = 0.01047128 Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping. Root Mean Square Error 22.8201171703

Run 2(Significant Improvement):

Iteration 1, loss = 0.03108813 Iteration 2, loss = 0.00776097 Iteration 3, loss = 0.01084675 Iteration 4, loss = 0.01023382 Iteration 5, loss = 0.00937209 Training loss did not improve more than tol=0.000001 for two consecutive epochs. Stopping. Root Mean Square Error 2.29407183124

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error. What am I missing here?

Please help!

Answer:

First confusing thing to me is that the same program is giving different RMS error at different run. Why? I am not getting it.

Neural networks are prone to **local optima**. There is never a guarantee you will learn anything decent, nor (as a consequence) that multiple runs lead to the same solution. Learning process is **heavily** random, depends on the initialization, sampling order etc. thus this kind of behaviour is **expected**.

In the documentation of MLPRegressor I could not find a way to directly hit the RMS error and keep the network running until I reach the desired RMS error.

Neural networks in sklearn are extremely basic, and they do not provide this kind of flexibility. If you need to work with more complex settings you simply need more NN oriented library, like Keras, TF etc. scikit-learn community struggled a lot to even make this NN implementation "in", and it does not seem like they are going to add much more flexibility in near future.

As a minor thing - use of "minmaxscaler" seem slightly odd. You should not "fit_transform" each time, you should fit only once, and later on - use transform (or inverse_transform). In particular, it should be

y_max_min_scaler = preprocessing.MinMaxScaler() y_scaled = y_max_min_scaler.fit_transform(y) ... predicted_scaled = y_max_min_scaler.inverse_transform(predicted.reshape(-1, 1))