Hot questions for Using Neural networks in non linear regression

Top 10 Python Open Source / Neural networks / non linear regression


Is there any difference in the architecture of a neural net for regression (time series prediction) and for classification?

I did some regression testing but I get quite bad results.

I'm currently using a basic feed forward net, with one hidden layer with 2 to 4 neurons, tanh activation function and momentum.


It depends on a lot of factors :

  1. In case of classification you can have a binary classification problem (where you want to discriminate between two classes) or multinomial classification problem. In both cases you could use different architectures for achieving the goal of the best data modeling.

  2. In case of sequence regression - you could also use loads of different architectures - starting from normal feedforward network which receives one series as input and returns second as output to a lot different recurent architectures.

So the question you asked is similiar to : are tools useful for building cars different from tools useful for building bridges - it's too ambiguous and you have to specify more details.


I don't understand why my code wouldn't run. I started with the TensorFlow tutorial to classify the images in the mnist data set using a single layer feedforward neural net. Then modified the code to create a multilayer perceptron that maps out 37 inputs to 1 output. The input and output training data are being loaded from Matlab data file (.mat)

Here is my code..

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from import loadmat
%matplotlib inline
import tensorflow as tf
from tensorflow.contrib import learn

import sklearn
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from warnings import filterwarnings
from sklearn import datasets
from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.datasets import make_moons

X = np.array(loadmat("Data/DataIn.mat")['TrainingDataIn'])
Y = np.array(loadmat("Data/DataOut.mat")['TrainingDataOut'])

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.5)
total_len = X_train.shape[0]

# Parameters
learning_rate = 0.001
training_epochs = 500
batch_size = 10
display_step = 1
dropout_rate = 0.9
# Network Parameters
n_hidden_1 = 19 # 1st layer number of features
n_hidden_2 = 26 # 2nd layer number of features
n_input = X_train.shape[1]
n_classes = 1

# tf Graph input
X = tf.placeholder("float32", [None, 37])
Y = tf.placeholder("float32", [None])

def multilayer_perceptron(X, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(X, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)

    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)

    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1], 0, 0.1)),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes], 0, 0.1))

biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1], 0, 0.1)),
    'b2': tf.Variable(tf.random_normal([n_hidden_2], 0, 0.1)),
    'out': tf.Variable(tf.random_normal([n_classes], 0, 0.1))

# Construct model
pred = multilayer_perceptron(X, weights, biases)
print("Prediction matrix:", pred)
print("Output matrix:", Y)

# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Launch the graph
with tf.Session() as sess:

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(total_len/batch_size)
        # Loop over all batches
        for i in range(total_batch-1):
            batch_x = X_train[i*batch_size:(i+1)*batch_size]
            batch_y = Y_train[i*batch_size:(i+1)*batch_size]
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c, p =[optimizer, cost, pred], feed_dict={X: batch_x,
                                                          Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch

        # sample prediction
        label_value = batch_y
        estimate = p
        err = label_value-estimate
        print ("num batch:", total_batch)

        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
            print ("[*]----------------------------")
            for i in xrange(5):
                print ("label value:", label_value[i], \
                    "estimated value:", estimate[i])
            print ("[*]============================")

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred), tf.argmax(Y))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print ("Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

when I run the code I get error messages:

ValueError                                Traceback (most recent call last)
<ipython-input-4-6b8af9192775> in <module>()
     93             # Run optimization op (backprop) and cost op (to get loss value)
     94             _, c, p =[optimizer, cost, pred], feed_dict={X: batch_x,
---> 95                                                           Y: batch_y})
     96             # Compute average loss
     97             avg_cost += c / total_batch

~\AppData\Local\Continuum\Anaconda3\envs\ann\lib\site-packages\tensorflow\python\client\ in run(self, fetches, feed_dict, options, run_metadata)
    787     try:
    788       result = self._run(None, fetches, feed_dict, options_ptr,
--> 789                          run_metadata_ptr)
    790       if run_metadata:
    791         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~\AppData\Local\Continuum\Anaconda3\envs\ann\lib\site-packages\tensorflow\python\client\ in _run(self, handle, fetches, feed_dict, options, run_metadata)
    973                 'Cannot feed value of shape %r for Tensor %r, '
    974                 'which has shape %r'
--> 975                 % (np_val.shape,, str(subfeed_t.get_shape())))
    976           if not self.graph.is_feedable(subfeed_t):
    977             raise ValueError('Tensor %s may not be fed.' % subfeed_t)

ValueError: Cannot feed value of shape (10, 1) for Tensor 'Placeholder_7:0', which has shape '(?,)'


I've encountered this problem before. The difference is that a Tensor of shape (10, 1) looks like [[1], [2], [3]], while a Tensor of shape (10,) looks like [1, 2, 3].

You should be able to fix it by changing the line

Y = tf.placeholder("float32", [None])


Y = tf.placeholder("float32", [None, 1])


My goal is to create a neural network with a single hidden layer (with ReLU activation) that is able to approximate a simple univariate square root function. I have implemented the network with numpy, also did a gradient check, everything seems to be fine, except for the result: for some reason I can only obtain linear approximations, like this: noisy sqrt approx

Tried changing the hyperparameters, without any success. Any ideas?

import numpy as np

step_size = 1e-6
input_size, output_size = 1, 1
h_size = 10
train_size = 500
x_train = np.abs(np.random.randn(train_size, 1) * 1000)
y_train = np.sqrt(x_train) + np.random.randn(train_size, 1) * 0.5

#initialize weights and biases
Wxh = np.random.randn(input_size, h_size) * 0.01
bh = np.zeros((1, h_size))
Why = np.random.randn(h_size, output_size) * 0.01
by = np.zeros((1, output_size))

for i in range(300000):
    #forward pass
    h = np.maximum(0,, Wxh) + bh1)
    y_est =, Why) + by

    loss = np.sum((y_est - y_train)**2) / train_size
    dy = 2 * (y_est - y_train) / train_size

    print("loss: ",loss)

    #backprop at output
    dWhy =, dy)
    dby = np.sum(dy, axis=0, keepdims=True)
    dh =, Why.T)

    #backprop ReLU non-linearity    
    dh[h <= 0] = 0

    #backprop Wxh, and bh
    dWxh =, dh)
    dbh = np.sum(dh1, axis=0, keepdims=True)

    Wxh += -step_size * dWxh
    bh += -step_size * dbh
    Why += -step_size * dWhy
    by += -step_size * dby

Edit: It seems the problem was the lack of normalization and the data being non-zero centered. After applying these transformation on the training the data, I have managed to obtain the following result: noisy sqrt2


I can get your code to produce a sort of piecewise linear approximation:

if I zero-centre and normalise your input and output ranges:

# normalise range and domain
x_train -= x_train.mean()
x_train /= x_train.std()
y_train -= y_train.mean()
y_train /= y_train.std()

Plot is produced like so:

x = np.linspace(x_train.min(),x_train.max(),3000)
y =,[:,None], Wxh) + bh), Why) + by
import matplotlib.pyplot as plt


I am trying to run a MLP regressor on my dataset with one hidden layer. I am doing a standardization of my data but I want to be clear as whether it matters if I do the standardization after or before splitting the dataset in Training and Test set. I want to know if there will be any difference in my prediction values if I carry out standardization before data split.


Yes and no. If mean and variance of the training and test set are different, standardization can lead to a different outcome.

That being said, a good training and test set should be similar enough so that the data points are distributed in a similar way, and post-split standardization should give the same results.


I am new to ML, and I have a dataset:


X = {X_1, X_2, X_3, X_4, X_5, X_6, X_7};
Y = Y;

I'm trying to find the possible relationship between X and Y like Y = M(X) using Deep Learning. To my knowledge, this is a regression task since the data type of my target Y is real.

I have tried some regression algorithms like LMS and Stepwise regression but none of those gives me a promising result. So I'm turning into the deep neural network solution, so:

  • Can ANN do this regression task?
  • How to design the network, especially the type of layers, activation function, etc.?
  • Is there some existing NN architecture I can refer to?

Any help is appreciated.


I don't have a solution for the machine learning part, but I do have a solution that would maybe work (since you asked for any other solutions).

I will say it might be difficult to use machine learning, since not only do you need to find a relationship (assuming there is one), but you need to find the right type of model (is it linear, quadratic, exponential, sinusoidal, etc.) and then you need to find the parameters for those models.

In the R programming language, it is easy to set up a multiple linear regression, for example. Here is a sketch of the R code you would use to find a linear regression.

data = load("data.Rdata") # or load a table or something
regression = lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + x7, data = data)

Edit: you might get better answers here: