Hot questions for Using Neural networks in linear regression

Question:

I have been trying to implement a simple linear regression model using neural networks in Keras in hopes to understand how do we work in Keras library. Unfortunately, I am ending up with a very bad model. Here is the implementation:

from pylab import *
from keras.models import Sequential
from keras.layers import Dense

#Generate dummy data
data = data = linspace(1,2,100).reshape(-1,1)
y = data*5

#Define the model
def baseline_model():
   model = Sequential()
   model.add(Dense(1, activation = 'linear', input_dim = 1))
   model.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['accuracy'])
   return model


#Use the model
regr = baseline_model()
regr.fit(data,y,epochs =200,batch_size = 32)
plot(data, regr.predict(data), 'b', data,y, 'k.')

The generated plot is as follows:

Can somebody point out the flaw in the above definition of the model (which could ensure a better fit)?


Answer:

You should increase the learning rate of optimizer. The default value of learning rate in RMSprop optimizer is set to 0.001, therefore the model takes a few hundred epochs to converge to a final solution (probably you have noticed this yourself that the loss value decreases slowly as shown in the training log). To set the learning rate import optimizers module:

from keras import optimizers

# ...
model.compile(optimizer=optimizers.RMSprop(lr=0.1), loss='mean_squared_error', metrics=['mae'])

Either of 0.01 or 0.1 should work fine. After this modification you may not need to train the model for 200 epochs. Even 5, 10 or 20 epochs may be enough.

Also note that you are performing a regression task (i.e. predicting real numbers) and 'accuracy' as metric is used when you are performing a classification task (i.e. predicting discrete labels like category of an image). Therefore, as you can see above, I have replaced it with mae (i.e. mean absolute error) which is also much more interpretable than the value of loss (i.e. mean squared error) used here.

Question:

I am trying to implement MAE as a performance measurement for my DNN regression model. I am using DNN to predict the number of comments a facebook post will get. As I understand, if it is a classification problem, then we use accuracy. If it is regression problem, then we use either RMSE or MAE. My code is the following:

with tf.name_scope("eval"):
    correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits)
    mae = tf.reduce_mean(tf.cast(correct, tf.int64))
    mae_summary = tf.summary.scalar('mae', accuracy)

For some reason, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-396-313ddf858626> in <module>()
      1 with tf.name_scope("eval"):
----> 2     correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits)
      3     mae = tf.reduce_mean(tf.cast(correct, tf.int64))
      4     mae_summary = tf.summary.scalar('mae', accuracy)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py in mean_absolute_error(labels, predictions, weights, metrics_collections, updates_collections, name)
    736   predictions, labels, weights = _remove_squeezable_dimensions(
    737       predictions=predictions, labels=labels, weights=weights)
--> 738   absolute_errors = math_ops.abs(predictions - labels)
    739   return mean(absolute_errors, weights, metrics_collections,
    740               updates_collections, name or 'mean_absolute_error')

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
    883       if not isinstance(y, sparse_tensor.SparseTensor):
    884         try:
--> 885           y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
    886         except TypeError:
    887           # If the RHS is not a tensor, it might be a tensor aware object

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
    834       name=name,
    835       preferred_dtype=preferred_dtype,
--> 836       as_ref=False)
    837 
    838 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
    924 
    925     if ret is None:
--> 926       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    927 
    928     if ret is NotImplemented:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref)
    772     raise ValueError(
    773         "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" %
--> 774         (dtype.name, t.dtype.name, str(t)))
    775   return t
    776 

ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("eval_9/remove_squeezable_dimensions/cond_1/Merge:0", dtype=int64)'

Answer:

This line in your code:

correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits)

executes in a way where TensorFlow is first subtracting predictions from labels as seen in the backrace:

absolute_errors = math_ops.abs(predictions - labels)

In order to do the subtraction, the two tensors need to be the same datatype. Presumably your predictions (logits) are float32 and from the error message your labels are int64. You either have to do an explicit conversion with tf.to_float or an implicit one you suggest in your comment: defining the placeholder as float32 to start with, and trusting TensorFlow to do the conversion when the feed dictionary is processed.

Question:

Previously I built a network that implemented a binary image segmentation -- foreground & background. I did this by having two classifications. Now instead of a binary classification, I want to do a linear regression of each pixel.

Say there is a 3D surface within the image view, I want to segment the exact middle of that surface with a linear value 10. The edge of the surface will be, let's say, 5. Of course all the voxels in between are within the range 5-10. Then, as the voxels move away from the surface the values quickly go down to zero.

With the binary classification I had an image with 1's in the places of the foreground and an image with 1's in the place of the background -- in other words a classification :) Now I want to have just one ground truth image with values like the following...

Via this linear regression example, I assumed I could simply change the cost function to a least square function -- cost = tf.square(y - pred). And of course I would change the ground truth.

However, when I do this, my predictions output NaN. My last layer is a linear sum of matrix weight values multiplied by the final output. I'm guessing this has something to do with it? I can't make it a tf.nn.softmax() function because that would normalize the values between 0 and 1.

So I believe cost = tf.square(y - pred) is the source of the issue. I tried this next... cost = tf.reduce_sum(tf.square(y - pred)) and that didn't work.

So then I tried this (recommended here) cost = tf.reduce_sum(tf.pow(pred - y, 2))/(2 * batch_size) and that didn't work.

Should I be initializing weights differently? Normalize weights?

Full code looks like this:

import tensorflow as tf
import pdb
import numpy as np
from numpy import genfromtxt
from PIL import Image
from tensorflow.python.ops import rnn, rnn_cell
from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data

# Parameters
learning_rate = 0.001
training_iters = 1000000
batch_size = 2
display_step = 1

# Network Parameters
n_input_x = 396 # Input image x-dimension
n_input_y = 396 # Input image y-dimension
n_classes = 1 # Binary classification -- on a surface or not
n_steps = 396
n_hidden = 128
n_output = n_input_y * n_classes

dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input_x, n_input_y])
y = tf.placeholder(tf.float32, [None, n_input_x * n_input_y], name="ground_truth")
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')

def deconv2d(prev_layer, w, b, output_shape, strides):
    # Deconv layer
    deconv = tf.nn.conv2d_transpose(prev_layer, w, output_shape=output_shape, strides=strides, padding="VALID")
    deconv = tf.nn.bias_add(deconv, b)
    deconv = tf.nn.relu(deconv)
    return deconv

# Create model
def net(x, cnn_weights, cnn_biases, dropout):
    # Reshape input picture
    x = tf.reshape(x, shape=[-1, 396, 396, 1])

    with tf.name_scope("conv1") as scope:
    # Convolution Layer
        conv1 = conv2d(x, cnn_weights['wc1'], cnn_biases['bc1'])
        # Max Pooling (down-sampling)
        #conv1 = tf.nn.local_response_normalization(conv1)
        conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    with tf.name_scope("conv2") as scope:
        conv2 = conv2d(conv1, cnn_weights['wc2'], cnn_biases['bc2'])
        # Max Pooling (down-sampling)
        # conv2 = tf.nn.local_response_normalization(conv2)
        conv2 = maxpool2d(conv2, k=2)

    # Convolution Layer
    with tf.name_scope("conv3") as scope:
        conv3 = conv2d(conv2, cnn_weights['wc3'], cnn_biases['bc3'])
        # Max Pooling (down-sampling)
        # conv3 = tf.nn.local_response_normalization(conv3)
        conv3 = maxpool2d(conv3, k=2)


    temp_batch_size = tf.shape(x)[0] #batch_size shape
    with tf.name_scope("deconv1") as scope:
        output_shape = [temp_batch_size, 99, 99, 64]
        strides = [1,2,2,1]
        # conv4 = deconv2d(conv3, weights['wdc1'], biases['bdc1'], output_shape, strides)
        deconv = tf.nn.conv2d_transpose(conv3, cnn_weights['wdc1'], output_shape=output_shape, strides=strides, padding="SAME")
        deconv = tf.nn.bias_add(deconv, cnn_biases['bdc1'])
        conv4 = tf.nn.relu(deconv)

        # conv4 = tf.nn.local_response_normalization(conv4)

    with tf.name_scope("deconv2") as scope:
        output_shape = [temp_batch_size, 198, 198, 32]
        strides = [1,2,2,1]
        conv5 = deconv2d(conv4, cnn_weights['wdc2'], cnn_biases['bdc2'], output_shape, strides)
        # conv5 = tf.nn.local_response_normalization(conv5)

    with tf.name_scope("deconv3") as scope:
        output_shape = [temp_batch_size, 396, 396, 1]
        #this time don't use ReLu -- since output layer
        conv6 = tf.nn.conv2d_transpose(conv5, cnn_weights['wdc3'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
        x = tf.nn.bias_add(conv6, cnn_biases['bdc3'])

    # Include dropout
    #conv6 = tf.nn.dropout(conv6, dropout)

    x = tf.reshape(conv6, [-1, n_input_x, n_input_y])

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, n_steps, n_input)
    # Permuting batch_size and n_steps
    x = tf.transpose(x, [1, 0, 2])
    # Reshaping to (n_steps*batch_size, n_input)

    x = tf.reshape(x, [-1, n_input_x])
    # Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)
    # This input shape is required by `rnn` function
    x = tf.split(0, n_steps, x)
    # Define a lstm cell with tensorflow
    lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True, activation=tf.nn.relu)
    # lstm_cell = rnn_cell.MultiRNNCell([lstm_cell] * 12, state_is_tuple=True)
    # lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.8)
    outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)
    # Linear activation, using rnn inner loop last output
    # pdb.set_trace()
    output = []
    for i in xrange(396):
        output.append(tf.matmul(outputs[i], lstm_weights[i]) + lstm_biases[i])

    return output


cnn_weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1' : tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2' : tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc3' : tf.Variable(tf.random_normal([5, 5, 64, 128])),

    'wdc1' : tf.Variable(tf.random_normal([2, 2, 64, 128])),

    'wdc2' : tf.Variable(tf.random_normal([2, 2, 32, 64])),

    'wdc3' : tf.Variable(tf.random_normal([2, 2, 1, 32])),
}

cnn_biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bc3': tf.Variable(tf.random_normal([128])),
    'bdc1': tf.Variable(tf.random_normal([64])),
    'bdc2': tf.Variable(tf.random_normal([32])),
    'bdc3': tf.Variable(tf.random_normal([1])),
}

lstm_weights = {}
lstm_biases = {}

for i in xrange(396):
    lstm_weights[i] = tf.Variable(tf.random_normal([n_hidden, n_output]))
    lstm_biases[i] = tf.Variable(tf.random_normal([n_output]))


# Construct model
# with tf.name_scope("net") as scope:
pred = net(x, cnn_weights, cnn_biases, keep_prob)
# pdb.set_trace()
pred = tf.pack(pred)
pred = tf.transpose(pred, [1,0,2])
pred = tf.reshape(pred, [-1, n_input_x * n_input_y])

with tf.name_scope("opt") as scope:
    # cost = tf.reduce_sum(tf.square(y-pred))
    cost = tf.reduce_sum(tf.pow((pred-y),2)) / (2*batch_size)
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
with tf.name_scope("acc") as scope:
    # accuracy is the difference between prediction and ground truth matrices
    correct_pred = tf.equal(0,tf.cast(tf.sub(cost,y), tf.int32))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph) #initialize graph for tensorboard
    step = 1
    # Import data
    data = scroll_data.read_data('/home/kendall/Desktop/')
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = data.train.next_batch(batch_size)
        # Run optimization op (backprop)
        # pdb.set_trace()
        batch_x = batch_x.reshape((batch_size, n_input_x, n_input_y))
        batch_y = batch_y.reshape(batch_size, n_input_x * n_input_y)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        step = step + 1
        if step % display_step == 0:
            batch_y = batch_y.reshape(batch_size, n_input_x * n_input_y)
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y})


            # Make prediction
            im = Image.open('/home/kendall/Desktop/cropped/temp data0001.tif')
            batch_x = np.array(im)
            batch_x = batch_x.reshape((1, n_input_x, n_input_y))
            batch_x = batch_x.astype(float)
            prediction = sess.run(pred, feed_dict={x: batch_x})
            prediction = prediction.reshape((1, n_input_x * n_input_y))
            prediction = tf.nn.softmax(prediction)
            prediction = prediction.eval()
            prediction = prediction.reshape((n_input_x, n_input_y))

            # my_accuracy = accuracy_custom(temp_arr1,batch_y[0,:,:,0])
            #
            # print "Step = " + str(step) + "  |  Accuracy = " + str(my_accuracy)
            print "Step = " + str(step) + "  |  Accuracy = " + str(acc)

            # csv_file = "CNN-LSTM-reg/CNNLSTMreg-step-" + str(step) + "-accuracy-" + str(my_accuracy) + ".csv"
            csv_file = "CNN-LSTM-reg/CNNLSTMreg-step-" + str(step) + "-accuracy-" + str(acc) + ".csv"
            np.savetxt(csv_file, prediction, delimiter=",")

Answer:

As said in the comments, a good weight initialization is key to the success of a model:

  • too high: the model will not learn and may produce NaN values
  • too low: the model will learn very very slowly, because the gradient will be too small (see vanishing gradients)

There are good initializations already provided in TensorFlow here (as a contribution), feel free to use them.

Question:

I'm studying about neural networks and I came across the following xorcise (that was a joke!) question.

I asked my friend if I need to implement the perceptron algorithm to solve it and he said "no- just think about it". Well, I thought about it, but my small monkey brain was only able to come up with the following:

My friends words make me think it's a trick question, and the only trick we've discussed thus far is the inability of perceptron to do an XOR function.

Is that was this question is getting at?

How to solve this problem?...

A simple Perzeptron with two inputs x₁, x₂,  
BIAS, and transfer function y = f(z) = sgn(z), 
separates the two dimensional input space into 
two parts with help of a line g. 

Calculate for this Perzeptron the weights 
w1, w2, wb, so that the line separates 
the given 6 patterns (pX1, pX2; Py) into 
two classes:

1X = (0,  0;  -1), 
2X = (2, -2; +1), 
3X = (7 + ε, 3 - ε; +1), 
4X = (7  -  ε, 3 + ε; -1), 
5X = (0, -2 - ε; +1), 
6X = (0 - ε, -2; -1), 


Remark: 0 < ε << 1. 

Answer:

If you graph the points, you'll see that all the -1s are on the top left side, and all the +1s are on the bottom right. You can draw a line intersecting (0, -2) and (7,3) which gives you the expression:

y = 5x/7 - 2

which is enough to skip running through any algorithm.

The equation of the line to predict +1 occurences is given by:

y < 5x/7 - 2 

The line above splits the 2 dimensional space in two. The shaded area is BELOW the line, and the line goes up and to the right. So for any arbitrary point, you just have to figure out if it's in the shaded area (positive prediction = +1).

Say, (pX1, pX2) = (35, 100),

1) one way is to plug pX1 back in the formula (for x' = pX1) to find the closest part on the line (where y=5x/7-2):

y' = 5(35)/7 - 2
y' = 23

Since the point on the line is (35, 23) and the point we are interested is (35, 100), it is above the line. In other words, pX2 is NOT < 23, the prediction returns -1.

2) plot y'=100, so

100 = 5x/7-2 
x = 142.8

Line point=(142.8, 100), your point (35, 100), our point is on the left of the line point, it still falls outside the shaded area.

3) You can even graph it and visually check if it's in the shaded area

The point is some calculation has to be done to check if it's IN or OUT. That's the point of linear regression. It should be really simple for the machine to figure out because you're just calculating one thing. Predicting once you have your formulas should be quick. The hardest part is determining the formula for the line, which we've already done by graphing the points and seeing an obvious solution. If you use machine learning, it will take longer in this case.

Question:

I am using tensor flow library to build a pretty simple 2 layer artificial neural network to perform linear regression. My problem is that the results seem to be far from expected. I've been trying to spot my mistake for hours but no hope. I am new to tensor flow and neural networks so it could be a trivial mistake. Could anyone have an idea what i am doing wrong?

from __future__ import print_function

 import tensorflow as tf
 import numpy as np
 # Python optimisation variables
 learning_rate = 0.02

data_size=100000
data_length=100
train_input=10* np.random.rand(data_size,data_length);
train_label=train_input.sum(axis=1);
train_label=np.reshape(train_label,(data_size,1));

test_input= np.random.rand(data_size,data_length);
test_label=test_input.sum(axis=1);
test_label=np.reshape(test_label,(data_size,1));

x = tf.placeholder(tf.float32, [data_size, data_length])
y = tf.placeholder(tf.float32, [data_size, 1])

W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([data_size, 1]), name='b1')

y_ = tf.add(tf.matmul(x, W1), b1)


cost = tf.reduce_mean(tf.square(y-y_))                   
optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
.minimize(cost)

init_op = tf.global_variables_initializer()

correct_prediction = tf.reduce_mean(tf.square(y-y_))    
accuracy = tf.cast(correct_prediction, tf.float32)


with tf.Session() as sess:
  sess.run(init_op)
  _, c = sess.run([optimiser, cost], 
                     feed_dict={x:train_input , y:train_label})
  k=sess.run(b1)
  print(k)                   
  print(sess.run(accuracy, feed_dict={x: test_input, y: test_label}))

Thanks for your help!


Answer:

There are a number of changes you have to make in your code.

First of all, you have to perform training for number of epochs and also feed the optimizer training data in batches. Your learning rate was very high. Bias is supposed to be only one input for every dense (fully connected) layer. You can plot the cost (loss) value to see how your network is converging. In order to feed data in batches, I have made the changes in placeholders also. Check the full modified code:

from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Python optimisation variables
learning_rate = 0.001  

data_size=1000  # Had to change these value to fit in my memory
data_length=10
train_input=10* np.random.rand(data_size,data_length);
train_label=train_input.sum(axis=1);
train_label=np.reshape(train_label,(data_size,1));

test_input= np.random.rand(data_size,data_length);
test_label=test_input.sum(axis=1);
test_label=np.reshape(test_label,(data_size,1));

tf.reset_default_graph()
x = tf.placeholder(tf.float32, [None, data_length])
y = tf.placeholder(tf.float32, [None, 1])

W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([1, 1]), name='b1')

y_ = tf.add(tf.matmul(x, W1), b1)


cost = tf.reduce_mean(tf.square(y-y_))                   
optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

init_op = tf.global_variables_initializer()

EPOCHS = 500
BATCH_SIZE = 32
with tf.Session() as sess:
    sess.run(init_op)

    loss_history = []
    for epoch_no in range(EPOCHS):
        for offset in range(0, data_size, BATCH_SIZE):
            batch_x = train_input[offset: offset + BATCH_SIZE]
            batch_y = train_label[offset: offset + BATCH_SIZE]

            _, c = sess.run([optimiser, cost], 
                     feed_dict={x:batch_x , y:batch_y})
            loss_history.append(c)


    plt.plot(range(len(loss_history)), loss_history)
    plt.show()

    # For running test dataset
    results, test_cost = sess.run([y_, cost], feed_dict={x: test_input, y: test_label})
    print('test cost: {:.3f}'.format(test_cost))
    for t1, t2 in zip(results, test_label):
        print('Prediction: {:.3f}, actual: {:.3f}'.format(t1[0], t2[0]))

Question:

import numpy as np

np.random.seed(0)
a = np.random.randint(1,100, size= 1000).reshape(1000,1)
b = np.random.randint(0,2, size=1000).reshape(1000,1)

y = np.where(b==0,a*2, a*3)

X = np.hstack((a,b))
y = y

from sklearn.preprocessing import StandardScaler

sx = StandardScaler()
X = sx.fit_transform(X)

sy = StandardScaler()
y = sy.fit_transform(y)

w0 = np.random.normal(size=(2,1), scale=0.1)

for i in range(100):
    input_layer = X
    output_layer = X.dot(w0) 

    error = y - output_layer
    square_error = np.sqrt(np.mean(error**2))

    print(square_error)

    w0+= input_layer.T.dot(error) 

If i understand correctly, linear activation function is always f(x) = x.

If you check this code, you'll see square error is growing and growing, I have no idea how to solve this simple linear problem with NN. I am aware there are other models and libraries, however I am trying to do it this way.


Answer:

You did not incorporate learning rate (see here and a more formal discussion here) into your model. When you train your network, you need to choose a learning rate parameter as well, and it has a big impact on whether your loss will decrease and how fast it converges.

By setting

w0+= input_layer.T.dot(error)

you chose the learning rate to be 1, which turned out to be too large. If instead you set

w0+= 0.0005*input_layer.T.dot(error) 

(that is, choose learning rate 0.0005) the loss will decrease:

1.0017425183
0.521060951473
0.303777564629
0.21993949808
0.193933601196
0.18700323975
0.185262617455
0.184832603515
0.184726763539
.
.
.

It won't converge to 0 though, as your model is not linear. In the end the weight w0 that you get is

array([[ 0.92486712],
       [ 0.318241  ]])

Question:

import pandas as pd
import matplotlib.pyplot as plt

csv = 'C:\\Users\\Alex\\Downloads\\weight-height.csv'

df = pd.read_csv(csv)
df.head

x_train = df['Height'].values
#into centimetres because im english
x_train = x_train * 2.54
y_train = df['Weight'].values
#into kilos because im english
y_train = y_train / 2.2046226218

plt.figure()
plt.scatter(x_train, y_train, c=None)
plt.show()
print(X[:10])
print(y[:10])

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import numpy as np

X = np.array(x_train).reshape(-1,1)
y = np.array(y_train).reshape(-1,1)

X = X[:5000]
y = y[:5000]

model = Sequential()
model.add(Dense(36, activation='relu'))
model.add(Dense(18))
model.add(Dense(1))

model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])

history = model.fit(X,y, batch_size=1, epochs=1, validation_split=0.1)

#plt.plot(history.history['acc'])
#plt.plot(history.history['val_acc'])

My problem is pretty much that I'm a noob and I'm trying to create my own linear regression model from scratch using keras and I can't understand why my loss is so high. I need to know if its the optimizer or loss function I'm using or a data problem. The dataset is simply a list of weights and heights.


Answer:

I would try:

  1. Normalising your heights and weights so that the maximum of each is one. Deep learning is generally easier when values are about one. Obviously you need to do the same division to your test data, and then multiply the answers by the same amount at the end.

  2. Changing your metric to 'mse' or 'mae' (mean squared error or mean abs error). This won't change your loss, but will make you feel better as it is a more meaningful measure of how well you are doing.

Try this:

x_train = df['Height'].values
x_train = x_train * 2.54
x_train = x_train / 175.0

y_train = df['Weight'].values
y_train = y_train / 2.2046226218
y_train = y_train / 80.0

...

model.compile(optimizer='adam',
          loss='mean_squared_error',
          metrics=['mse'])

And to test some new values

x_test = 187
x_test = np.array(x_test).reshape(-1,1)
x_test = x_test * 2.54
x_test = x_test / 175.0
pred = model.predict(x_test)
pred = pred * 80.0

Question:

I want to use a neural network to optimize an energy function to improve regression with rmsd. My energy function has 16 terms and I want to optimize the weights before summing it.

http://www.mathworks.com/help/nnet/gs/fit-data-with-a-neural-network.html has an example and my problem is something similar but I want to implement it in python.

Can someone please give me any pointers to where I can find similar examples/what modules I should use?


Answer:

I would also recommend pybrain. pybrain.tools.neuralnets.NNregression is a tool that will help you in regression. Here is an example that will help you get started.

Question:

Ive got a linear regression model in ML.NET and the predictions are working fine:

  MLContext mlContext = new MLContext(seed: 0);
            List<TwoInputRegressionModel> inputs = new List<TwoInputRegressionModel>();
            foreach (var JahrMitCO in ListWithCO)
            {
                float tempyear = JahrMitCO.Year;
                foreach (var JahrMitPopulation in Population)
                {
                    if (JahrMitPopulation.Year == tempyear)
                    {
                        inputs.Add(new TwoInputRegressionModel() { Year = tempyear, Population = JahrMitPopulation.Value, Co2 = JahrMitCO.Value });
                    }
                }
            }
            var model = Train(mlContext, inputs);
            TestSinglePrediction(mlContext, model); //works

But I would like to know how to gain access to the parameters (weights + bias) of the trained model? I do know that the ITransformer class (here called model)does contain a Model property, but trying to convert it to the LinearRegressionModelParameters class like stated on the documentation doesnt work:

 LinearRegressionModelParameters originalModelParameters = ((ISingleFeaturePredictionTransformer<object>)model).Model as LinearRegressionModelParameters; //Exception:System.InvalidCastException

The object of the type Microsoft.ML.Data.TransformerChain1[Microsoft.ML.Data.RegressionPredictionTransformer1[Microsoft.ML.Trainers.FastTree.FastTreeRegressionModelParameters]] can not be converted to Microsoft.ML.ISingleFeaturePredictionTransformer`1[System.Object]

How to access the model parameters?


Answer:

The problem in this case is that your model object isn't a ISingleFeaturePredictionTransformer, but instead it is a TransformerChain object (i.e. a chain of transformers), where the LastTransformer is the "prediction transformer".

So to fix this, first cast your model to TransformerChain<RegressionPredictionTransformer<FastTreeRegressionModelParameters>>, then you can get LastTransformer, which will return the RegressionPredictionTransformer<FastTreeRegressionModelParameters>, from there you can get the Model property.

If you don't happen to know at compile time which concrete type of transformer the TransformerChain will contain, you can cast model to IEnumerable<ITransformer> and get the .Last() transformer in the chain. That you can cast to ISingleFeaturePredictionTransformer<object> in order to get the Model property.

    ITransformer model = ...;
    IEnumerable<ITransformer> chain = model as IEnumerable<ITransformer>;

    ISingleFeaturePredictionTransformer<object> predictionTransformer =
        chain.Last() as ISingleFeaturePredictionTransformer<object>;

    object modelParameters = predictionTransformer.Model;

From there you can cast modelParameters to whatever specific ModelParameters class it happens to be.

Note: from your exception message, it doesn't appear you are training a linear regression model, but instead a fast tree model. Tree-based models won't be able to be casted as LinearRegressionModelParameters as so you won't see bias and weights, but instead you will see tree information.

Question:

Being new to deep learning world and after reading a lot of theories, I was trying to understand how does the neural net learn practically, so I provided it with simple dataset where input columns are [[x,x+y]...] and output column [[x/(x+y)]...] and tried it using tflearn but even after countless tries(2 days) the network is not able to minimize loss. Even after it minimizes(after adding tanh layer) the network predictions are way off. Can someone help me with this? Below is the code.

neural_net = tflearn.input_data(shape=[None, 2])

neural_net = tflearn.fully_connected(neural_net, 1, 
activation='linear', bias=True)
neural_net = tflearn.regression(neural_net, optimizer='sgd', 
learning_rate=0.001, loss='mean_square')
# train
model = tflearn.DNN(neural_net,tensorboard_verbose=3)
model.fit(OR, Y_truth, n_epoch=1000, 
snapshot_epoch=False,validation_set=0.1,batch_size=10)
print model.get_weights(neural_net.W)

# prediction
print model.predict([[2,2]])

The output prediction is 0.13946463!!


Answer:

Your network won't be able to learn that function.

The network has a single neuron. That's equivalent to ax+by+c where you try to learn a,b,c.

You need more non-linearity in your model. You need to add more layers and neurons with non-linear activation for the model to be able to learn your desired function.