Hot questions for Using Neural networks in anaconda


Good morning, I'm new in machine learning and neural networks. I am trying to build a fully connected neural network to solve a regression problem. The dataset is composed by 18 features and 1 label, and all of these are physical quantities.

You can find the code below. I upload the figure of the loss function evolution along the epochs (you can find it below). I am not sure if there is overfitting. Someone can explain me why there is or not overfitting?

import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import SelectFromModel
from sklearn import preprocessing

from sklearn.model_selection import train_test_split

from matplotlib import pyplot as plt

import keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping
from keras import optimizers
from sklearn.metrics import r2_score
from keras import regularizers
from keras import backend
from tensorflow.keras import regularizers
from keras.regularizers import l2

# =============================================================================
# Scelgo il test size
# =============================================================================
test_size = 0.2

dataset = pd.read_csv('DataSet.csv', decimal=',', delimiter = ";")

label = dataset.iloc[:,-1]
features = dataset.drop(columns = ['Label'])

y_max_pre_normalize = max(label)
y_min_pre_normalize = min(label)

def denormalize(y):
    final_value = y*(y_max_pre_normalize-y_min_pre_normalize)+y_min_pre_normalize
    return final_value

# =============================================================================
# Split
# =============================================================================

X_train1, X_test1, y_train1, y_test1 = train_test_split(features, label, test_size = test_size, shuffle = True)

y_test2 = y_test1.to_frame()
y_train2 = y_train1.to_frame()

# =============================================================================
# Normalizzo
# =============================================================================
scaler1 = preprocessing.MinMaxScaler()
scaler2 = preprocessing.MinMaxScaler()
X_train = scaler1.fit_transform(X_train1)
X_test = scaler2.fit_transform(X_test1)

scaler3 = preprocessing.MinMaxScaler()
scaler4 = preprocessing.MinMaxScaler()
y_train = scaler3.fit_transform(y_train2)
y_test = scaler4.fit_transform(y_test2)

# =============================================================================
# Creo la rete
# =============================================================================
optimizer = tf.keras.optimizers.Adam(lr=0.001)
model = Sequential()

model.add(Dense(60, input_shape = (X_train.shape[1],), activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))
model.add(Dense(60, activation = 'relu',kernel_initializer='glorot_uniform'))

model.add(Dense(1,activation = 'linear',kernel_initializer='glorot_uniform'))

model.compile(loss = 'mse', optimizer = optimizer, metrics = ['mse'])

history =, y_train, epochs = 100,
                    validation_split = 0.1, shuffle=True, batch_size=250

history_dict = history.history

loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']

y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

y_train_pred = denormalize(y_train_pred)
y_test_pred = denormalize(y_test_pred)

plt.plot((y_test1),(y_test_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')

plt.plot((y_train1),(y_train_pred),'.', color='darkviolet', alpha=1, marker='o', markersize = 2, markeredgecolor = 'black', markeredgewidth = 0.1)
plt.plot((np.array((-0.1,7))),(np.array((-0.1,7))),'-', color='magenta')

plt.plot(loss_values,'b',label = 'training loss')
plt.plot(val_loss_values,'r',label = 'val training loss')
plt.ylabel('Loss Function')

print("\n\nThe R2 score on the test set is:\t{:0.3f}".format(r2_score(y_test_pred, y_test1)))

print("The R2 score on the train set is:\t{:0.3f}".format(r2_score(y_train_pred, y_train1)))
from sklearn import metrics

# Measure MSE error.  
score = metrics.mean_squared_error(y_test_pred,y_test1)
print("\n\nFinal score test (MSE): %0.4f" %(score))
score1 = metrics.mean_squared_error(y_train_pred,y_train1)
print("Final score train (MSE): %0.4f" %(score1))
score2 = np.sqrt(metrics.mean_squared_error(y_test_pred,y_test1))
print(f"Final score test (RMSE): %0.4f" %(score2))
score3 = np.sqrt(metrics.mean_squared_error(y_train_pred,y_train1))
print(f"Final score train (RMSE): %0.4f" %(score3))


I tried alse to do feature importances and to raise n_epochs, these are the results:

Feature Importance:

No Feature Importace:


Looks like you don't have overfitting! Your training and validation curves are descending together and converging. The clearest sign you could get of overfitting would be a deviation between these two curves, something like this:

Since your two curves are descending and are not diverging, it indicates your NN training is healthy.

HOWEVER! Your validation curve is suspiciously below the training curve. This hints a possible data leakage (train and test data have been mixed somehow). More info on a nice an short blog post. In general, you should split the data before any other preprocessing (normalizing, augmentation, shuffling, etc...).

Other causes for this could be some type of regularization (dropout, BN, etc..) that is active while computing the training accuracy and it's deactivated when computing the Validation/Test accuracy.


I running the below code in jupyter notebook python:

# Run some setup code for this notebook.

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see
%load_ext autoreload
%autoreload 2

and then the below instructions:

# Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print ('Training data shape: ', X_train.shape)
print ('Training labels shape: ', y_train.shape)
print ('Test data shape: ', X_test.shape)
print ('Test labels shape: ', y_test.shape)

By running the 2nd portion , I ma getting the below error:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-9506c06e646a> in <module>()
      1 # Load the raw CIFAR-10 data.
      2 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
----> 3 X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
      5 # As a sanity check, we print out the size of the training and test data.

C:\Users\lenovo\assignment1\cs231n\ in load_CIFAR10(ROOT)
     20   for b in range(1,6):
     21     f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
---> 22     X, Y = load_CIFAR_batch(f)
     23     xs.append(X)
     24     ys.append(Y)

C:\Users\lenovo\assignment1\cs231n\ in load_CIFAR_batch(filename)
      7   """ load single batch of cifar """
      8   with open(filename, 'rb') as f:
----> 9     datadict = pickle.load(f)
     10     X = datadict['data']
     11     Y = datadict['labels']

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

How can I resolve this error? I am using Annaconda3 to run this code. It seems the above code has been writtern in Annaonda2 version. Any suugestion to fix these errors?

Just for more details:

I am trying to solve the assignment from the link:


Adding the containing definition of load_CIFAR

import _pickle as pickle
import numpy as np
import os
from scipy.misc import imread

def load_CIFAR_batch(filename):
  """ load single batch of cifar """
  with open(filename, 'rb') as f:
    datadict = pickle.load(f)
    X = datadict['data']
    Y = datadict['labels']
    X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
    Y = np.array(Y)
    return X, Y

def load_CIFAR10(ROOT):
  """ load all of cifar """
  xs = []
  ys = []
  for b in range(1,6):
    f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
    X, Y = load_CIFAR_batch(f)
  Xtr = np.concatenate(xs)
  Ytr = np.concatenate(ys)
  del X, Y
  Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
  return Xtr, Ytr, Xte, Yte


The pickle file you are loading has most likely been generated with python 2.

Since there are fundamental differences in how pickle works in Python2 and Python3, you may attempt loading the file using latin-1 encoding, hich assumes direct mapping of 0-255 to chars.

This method requires some sanity check, since it's not guaranteed to produce coherent data.


I'm currently trying to use the scikit learn package for its neural network functionality. I have a complex problem to solve with it, but to start out I am just trying a couple of basic tests to familiarize myself with it. I have gotten it to do something, but it isn't producing meaningful results. My code:

import sklearn.neural_network.multilayer_perceptron as nnet
import numpy
def generateTargetDataset(expression="%s", generateRange=(-100,100), s=1000):
    expression = expression.replace("x", "%s")    
    x = numpy.random.rand(s,)
    y = numpy.zeros((s,), dtype="float")
    numpy.multiply(x, abs(generateRange[1]-generateRange[0]), x)
    numpy.subtract(x, min(generateRange), x)
    for z in range(0, numpy.size(x)):
        y[z] = eval(expression % (x[z]))
    x = x.reshape(-1, 1)
    outTuple = (x, y)
print("New Net + Training")
QuadRegressor = nnet.MLPRegressor(hidden_layer_sizes=(10), warm_start=True, verbose=True, learning_rate_init=0.00001, max_iter=10000, algorithm="sgd", tol=0.000001)
data = generateTargetDataset(expression="x**2", s=10000, generateRange=(-1,1))[0], data[1])
print("Net Trained")
xt = numpy.random.rand(10000, 1)
yr = QuadRegressor.predict(xt)
yr = yr.reshape(-1, 1)
xt = xt.reshape(-1, 1)
numpy.multiply(xt, 100, xt)
numpy.multiply(yr, 10000, yr)
numpy.around(yr, 2, out=yr)
numpy.around(xt, 2, out=xt)
out = numpy.concatenate((xt, yr), axis=1)
numpy.savetxt(fname="C:\\SCRATCHDIR\\numpydump.csv", X=out, delimiter=",")

I don't understand how to post the data it gives me, but it spits out between 7000 and 10000 for all inputs between 0 and 100. It seems to be correctly mapped very close to the top of the range, but for inputs close to 0, it just returns something near 7000.

EDIT: I forgot to add this. The network has the same behavior if I remove the dummy training to y=x, but I read somewhere that sometimes you can help a network along by training it to a different but closer function and then using that already weighted network as a starting ground. It didn't work but I just hadn't taken that bit out yet.


My recommendation is to reduce the number of neurons per layer, and increase the training dataset size. Right now, you have a lot of parameters to train in your network, and a small training set (~10K). However, the main point of my answer is that sklearn probably isn't a great choice for your end application.

So you have a complex problem you want to solve with neural networks?

I have a complex problem to solve with it, but to start out I am just trying a couple of basic tests to familiarize myself with it.

According to the official user guide, sklearn's implementation of neural networks isn't designed for large applications and is a lot less flexible than other options for deep learning.

One Python deep learning library I've had good experiences with is keras, a modular, easy-to-use library with GPU support.

Here's a sample I coded up that trains a single perceptron to do quadratic regression.

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
import numpy as np
import matplotlib.pyplot as plt

model = Sequential()
model.add(Dense(1, init = 'uniform', input_dim=1))

model.compile(optimizer = SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True), loss = 'mse')

data = np.random.random(1000)
labels = data**2,1)), labels, nb_epoch = 1000, batch_size = 128, verbose = 1)

tdata = np.sort(np.random.random(100))
tlabels = tdata**2

preds = model.predict(tdata.reshape((len(tdata), 1)))

plt.plot(tdata, tlabels)
plt.scatter(tdata, preds)

This outputs a scatter plot of the test data points, along with a plot of the true curve.

As you can see, the results are reasonable. In general, neural networks are hard to train, and I had to do some parameter tuning before I got this example working.

It looks like you're using Windows. This question may be helpful for installing Keras on Windows.


I want to build an architecture like this in Keras.

Here the output of 1D CNN (flattened) will be given as input to the ANN, and some other additional input will be given to ANN too. So there are two positions where this whole model will take input. How to handle this in Keras? In the function, we normally use one input. I am using Keras on top of Tensorflow backend and using Anaconda Python 3.7.3.

(Here ANN means normal neural network)


Keras fully supports multi-input models.

The way you do it is to use the fucntional API and place two Input layers in your model. Build the rest of the architecture using the functional API and then define a Model with two inputs. During training you need to remember to feed both inputs in

In your case it would look something like this:

from keras.layers import Input, Conv1D, Flatten, Concatenate, Dense
from keras.models import Model

input1 = Input(shape=(...))  # add the shape of your input (excluding batch dimension)

conv = Conv1D(...)(input1)   # add convolution parameters (e.g. filters, kernel, strides)
flat = Flatten()(conv)

input2 = Input(shape=(...))  # add the shape of your secondary input

ann_input = Concatenate()([flat, input2])  # concatenate the two inputs of the ANN
ann = Dense(2)(ann_input)  # 2 because you are doing binary classification

model = Model(inputs=[input1, input2], outputs=[ann])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# assuming x1 and x2 are numpy arrays with the data for 'input1' and 'input2' 
# respectively and y is a numpy array containing the labels[x1, x2], y)