Hot questions for Using Neural networks in data analysis


Suppose I'm trying to use a neural network to predict how long my run will take. I have a lot of data from past runs. How many miles I plan on running, the total change in elevation (hills), the temperature, and the weather: sunny, overcast, raining, or snowing.

I'm confused on what to do with the last piece of data. For everything else I can input normally after standardizing, but I can't do that for the weather. My initial though was just to have 4 extra variables, one for each type of weather, and input put a 1 or a 0 depending on what it is.

Is this a good approach to the situation? are there other approaches I should try?


You have a categorical variable that has four levels.

A very typical way of encoding such values is to use a separate variable for each one. Or, more commonly, "n-1" coding, where one less flag is used (the fourth value is represented by all being 0).

n-1 coding is used for techniques that require numeric inputs -- including logistic regression and neural networks. For large values of "n", then it is a bad choice. The problem is that it creates many inputs of sparse data; sparse data is highly correlated. More inputs mean more degrees of freedom in the network, making the network harder to train.

In your case, you only have four values for this particular input. Splitting it into three variables is probably reasonable.


I am playing around with Adult Dataset and R. I am trying to use the neuralnet package to train a Neural Network with Back propagation. I have cleaned the data. Now I am trying to run this part :

n <- names(cleanTrain)
f <- as.formula(paste("income~", paste(n[!n %in% "income"], collapse = " + ")))
nn <- neuralnet(f, data=cleanTrain, hidden = 10, algorithm = "backprop", learningrate=0.35)

I get this ERROR:

Error in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments


  1. I load the train as cleanTrain
  2. n gets all the names of the dataset
  3. f returns income ~ age + workclass + education + education.num + marital.status + occupation + relationship + race +sex + capital.gain + capital.loss + hours.per.week +

Which is the error?


Hello first use a function to clean the Adult database, which you can find at Statistical Consulting Group, then convert all the variables into numeric, if the backpropagation algorithm does not work. You can see an example at neural net in R. Finally apply the algorithm with the following code.

train =$train, as.numeric))
train$income = train$income-1

n <- names(train)
f <- as.formula(paste("income~", paste(n[!n %in% "income"], collapse = " + ")))
nn <- neuralnet(f,data=train,hidden=10,err.fct="sse",linear.output=FALSE,algorithm="backprop",learningrate=0.35)

I hope it helps you. regards


I try to create a neural network with 1 hidden layer (let's assume that data vector contains 4 values, there are 3 neurons on the input layer, 3 neurons on the hidden layer and 1 neuron on the output level). I have two vectors of data with two known results.

I teach the network using first set of data, then I apply the second set. The weights are corrected by using back propagation method. The issue that if I try to predict the values of the first set after weights correction, I get a result which is very close to the second result. So, neural network "forgets" the first training.

A full code of my program is here

Weights values during and after teaching are here

Here is the NN training function

    public void Train(double[] data, double expectedResult)
        double result = Predict(data);
        double delta = Perceptron.ActivationFunction(expectedResult) - Perceptron.ActivationFunction(result);
        double eta = 20;

        // Calculate layer 2 deltas
        for (int i = 0; i < size2; i++)
            deltas2[i] = delta * weights3[i];

        // Calculate layer 1 deltas
        for (int i = 0; i < size1; i++)
            deltas1[i] = 0;

            for(int j=0; j < size2; j++) {
                deltas1[i] += deltas2[j] * weights2[j * size1 + i];

        // Correct layer 1 weights
        for (int i = 0; i < data.Length; i++)
            for (int j = 0; j < size1; j++) 
                weights1[j * data.Length + i] += eta * deltas1[j] * values1[j] * (1 - values1[j]) * data[i];

        // Correct layer 2 weights
        for (int i = 0; i < size1; i++)
            for (int j = 0; j < size2; j++)
                weights2[j * size1 + i] += eta * deltas2[j] * values2[j] * (1 - values2[j]) * values1[i];

        double resultA = Perceptron.ActivationFunction(result);
        for (int i = 0; i < size2; i++)
            weights3[i] += eta * delta * resultA * (1 - resultA) * values2[i];


Am I missed something?


I have figured out with the problem.

On the teaching step, I was repeatedly showing a first example to the network until the result is close to expected, then I was showing a second example.

  A A A A A B B B B B B

The neural network converges and recognizes examples correctly if I repeatedly show both examples in turn.

 A B A B A B A B A B A B


I have a general problem in the application domain.The data contains a high dimensional feature space with a small sample.A sparse network with its node as different features are available.The network has edges.The larger the edge, the higher the correlation or dependence the pair of features have. Generally how I can employ the network information in my model?

Currently I searched in the literature.I find the general approach contains: embedding.To make use of the network information to obtain an embedding of the features. 2.graph neural network.Like GCN (graph convolutional neural network) or GAT(graph attention neural network) or other message passing neural network.

The question is that what is the general approach a data scientist can have a try, to make use of the network information on the features? The network is not on different samples, just on the features.


The first thing that would come to my mind is to check the feature correlation using the network, and remove highly correlated features before training.


I have the following input data structure:

   X1     |    X2     |    X3     | ... | Output (Label)
118.12341 | 118.12300 | 118.12001 | ... | [a value between 0 & 1] e.g. 0.423645

Where I'm using tensorflow in order to solve the regression problem here of predicting the future value of the Output variable. For that i built a feed forward neural network with three hidden layers having relu activation functions and a final output layer with one node of linear activation. This network is trained with back-propagation using adam optimizer.

My problem is that after training the network for some thousands of epochs, I realized that this highly decimal values in both input features and the output, resulted in predictions near to the second decimal place only, for example:

Real value = 0.456751 | Predicted value = 0.452364

However this is not accepted, where i need a precision to the forth decimal place (at least) to accept the value.

Q: Is there any trustworthy technique to solve this problem properly for getting better results (maybe a transformation algorithm)?

Thanks in advance.


Assuming you are using a regular MSE loss, this will probably not suit your purpose of relatively low-tolerance in the error per instance. To elaborate, the MSE is defined as follow the average of the the square of the differences between the predicted and true outputs.

Assuming you have 4 instances, and two trained functions that generates the following error per instance:

F1 error rates : (4,.0004, .0002, .0002)

F2 error rates : (.9, .9, .9, .9)

It's obvious that MSE would go for F2, since the average MSE is .81, while the average MSE for F1 is approx 16.

So to conclude, MSE gives too little weight to small differences in value < 1, while it exaggerates the weight for bigger differences in value > 1 because of the square function applied.

You could try MAE, which stands for MEAN ABSOLUTE ERROR, it's only difference is that it doesn't perform a square function on the individual errors, rather it calculates the absolute. There are many other regression losses that could give significant weight to smaller errors like the HUBER loss with a small delta (< 0), you can read more about those losses here.

Another possible solution would be to transform this into a classification problem, where the prediction is true if it's exactly identical to the outputs to the 4th decimal point for example and else it's false.