## Hot questions for Using Neural networks in linear regression

Question:

I have been trying to implement a simple linear regression model using neural networks in Keras in hopes to understand how do we work in Keras library. Unfortunately, I am ending up with a very bad model. Here is the implementation:

from pylab import * from keras.models import Sequential from keras.layers import Dense #Generate dummy data data = data = linspace(1,2,100).reshape(-1,1) y = data*5 #Define the model def baseline_model(): model = Sequential() model.add(Dense(1, activation = 'linear', input_dim = 1)) model.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['accuracy']) return model #Use the model regr = baseline_model() regr.fit(data,y,epochs =200,batch_size = 32) plot(data, regr.predict(data), 'b', data,y, 'k.')

The generated plot is as follows:

Can somebody point out the flaw in the above definition of the model (which could ensure a better fit)?

Answer:

You should increase the learning rate of optimizer. The default value of learning rate in `RMSprop`

optimizer is set to `0.001`

, therefore the model takes a few hundred epochs to converge to a final solution (probably you have noticed this yourself that the loss value decreases slowly as shown in the training log). To set the learning rate import `optimizers`

module:

from keras import optimizers # ... model.compile(optimizer=optimizers.RMSprop(lr=0.1), loss='mean_squared_error', metrics=['mae'])

Either of `0.01`

or `0.1`

should work fine. After this modification you may not need to train the model for 200 epochs. Even 5, 10 or 20 epochs may be enough.

Also note that you are performing a regression task (i.e. predicting real numbers) and `'accuracy'`

as metric is used when you are performing a classification task (i.e. predicting discrete labels like category of an image). Therefore, as you can see above, I have replaced it with `mae`

(i.e. mean absolute error) which is also much more interpretable than the value of loss (i.e. mean squared error) used here.

Question:

I am trying to implement MAE as a performance measurement for my DNN regression model. I am using DNN to predict the number of comments a facebook post will get. As I understand, if it is a classification problem, then we use accuracy. If it is regression problem, then we use either RMSE or MAE. My code is the following:

with tf.name_scope("eval"): correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits) mae = tf.reduce_mean(tf.cast(correct, tf.int64)) mae_summary = tf.summary.scalar('mae', accuracy)

For some reason, I get the following error:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-396-313ddf858626> in <module>() 1 with tf.name_scope("eval"): ----> 2 correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits) 3 mae = tf.reduce_mean(tf.cast(correct, tf.int64)) 4 mae_summary = tf.summary.scalar('mae', accuracy) ~/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/metrics_impl.py in mean_absolute_error(labels, predictions, weights, metrics_collections, updates_collections, name) 736 predictions, labels, weights = _remove_squeezable_dimensions( 737 predictions=predictions, labels=labels, weights=weights) --> 738 absolute_errors = math_ops.abs(predictions - labels) 739 return mean(absolute_errors, weights, metrics_collections, 740 updates_collections, name or 'mean_absolute_error') ~/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y) 883 if not isinstance(y, sparse_tensor.SparseTensor): 884 try: --> 885 y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y") 886 except TypeError: 887 # If the RHS is not a tensor, it might be a tensor aware object ~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype) 834 name=name, 835 preferred_dtype=preferred_dtype, --> 836 as_ref=False) 837 838 ~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx) 924 925 if ret is None: --> 926 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 927 928 if ret is NotImplemented: ~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref) 772 raise ValueError( 773 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" % --> 774 (dtype.name, t.dtype.name, str(t))) 775 return t 776 ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("eval_9/remove_squeezable_dimensions/cond_1/Merge:0", dtype=int64)'

Answer:

This line in your code:

correct = tf.metrics.mean_absolute_error(labels = y, predictions = logits)

executes in a way where TensorFlow is first subtracting predictions from labels as seen in the backrace:

absolute_errors = math_ops.abs(predictions - labels)

In order to do the subtraction, the two tensors need to be the same datatype. Presumably your predictions (logits) are float32 and from the error message your labels are int64. You either have to do an explicit conversion with `tf.to_float`

or an implicit one you suggest in your comment: defining the placeholder as float32 to start with, and trusting TensorFlow to do the conversion when the feed dictionary is processed.

Question:

Previously I built a network that implemented a binary image segmentation -- foreground & background. I did this by having two classifications. Now instead of a binary classification, I want to do a linear regression of each pixel.

Say there is a 3D surface within the image view, I want to segment the exact middle of that surface with a linear value 10. The edge of the surface will be, let's say, 5. Of course all the voxels in between are within the range 5-10. Then, as the voxels move away from the surface the values quickly go down to zero.

With the binary classification I had an image with 1's in the places of the foreground and an image with 1's in the place of the background -- in other words a classification :) Now I want to have just one ground truth image with values like the following...

Via this linear regression example, I assumed I could simply change the cost function to a least square function -- `cost = tf.square(y - pred)`

. And of course I would change the ground truth.

However, when I do this, my predictions output `NaN`

. My last layer is a linear sum of matrix weight values multiplied by the final output. I'm guessing this has something to do with it? I can't make it a `tf.nn.softmax()`

function because that would normalize the values between 0 and 1.

So I believe `cost = tf.square(y - pred)`

is the source of the issue. I tried this next... `cost = tf.reduce_sum(tf.square(y - pred))`

and that didn't work.

So then I tried this (recommended here) `cost = tf.reduce_sum(tf.pow(pred - y, 2))/(2 * batch_size)`

and that didn't work.

Should I be initializing weights differently? Normalize weights?

Full code looks like this:

import tensorflow as tf import pdb import numpy as np from numpy import genfromtxt from PIL import Image from tensorflow.python.ops import rnn, rnn_cell from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data # Parameters learning_rate = 0.001 training_iters = 1000000 batch_size = 2 display_step = 1 # Network Parameters n_input_x = 396 # Input image x-dimension n_input_y = 396 # Input image y-dimension n_classes = 1 # Binary classification -- on a surface or not n_steps = 396 n_hidden = 128 n_output = n_input_y * n_classes dropout = 0.75 # Dropout, probability to keep units # tf Graph input x = tf.placeholder(tf.float32, [None, n_input_x, n_input_y]) y = tf.placeholder(tf.float32, [None, n_input_x * n_input_y], name="ground_truth") keep_prob = tf.placeholder(tf.float32) #dropout (keep probability) # Create some wrappers for simplicity def conv2d(x, W, b, strides=1): # Conv2D wrapper, with bias and relu activation x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') x = tf.nn.bias_add(x, b) return tf.nn.relu(x) def maxpool2d(x, k=2): # MaxPool2D wrapper return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME') def deconv2d(prev_layer, w, b, output_shape, strides): # Deconv layer deconv = tf.nn.conv2d_transpose(prev_layer, w, output_shape=output_shape, strides=strides, padding="VALID") deconv = tf.nn.bias_add(deconv, b) deconv = tf.nn.relu(deconv) return deconv # Create model def net(x, cnn_weights, cnn_biases, dropout): # Reshape input picture x = tf.reshape(x, shape=[-1, 396, 396, 1]) with tf.name_scope("conv1") as scope: # Convolution Layer conv1 = conv2d(x, cnn_weights['wc1'], cnn_biases['bc1']) # Max Pooling (down-sampling) #conv1 = tf.nn.local_response_normalization(conv1) conv1 = maxpool2d(conv1, k=2) # Convolution Layer with tf.name_scope("conv2") as scope: conv2 = conv2d(conv1, cnn_weights['wc2'], cnn_biases['bc2']) # Max Pooling (down-sampling) # conv2 = tf.nn.local_response_normalization(conv2) conv2 = maxpool2d(conv2, k=2) # Convolution Layer with tf.name_scope("conv3") as scope: conv3 = conv2d(conv2, cnn_weights['wc3'], cnn_biases['bc3']) # Max Pooling (down-sampling) # conv3 = tf.nn.local_response_normalization(conv3) conv3 = maxpool2d(conv3, k=2) temp_batch_size = tf.shape(x)[0] #batch_size shape with tf.name_scope("deconv1") as scope: output_shape = [temp_batch_size, 99, 99, 64] strides = [1,2,2,1] # conv4 = deconv2d(conv3, weights['wdc1'], biases['bdc1'], output_shape, strides) deconv = tf.nn.conv2d_transpose(conv3, cnn_weights['wdc1'], output_shape=output_shape, strides=strides, padding="SAME") deconv = tf.nn.bias_add(deconv, cnn_biases['bdc1']) conv4 = tf.nn.relu(deconv) # conv4 = tf.nn.local_response_normalization(conv4) with tf.name_scope("deconv2") as scope: output_shape = [temp_batch_size, 198, 198, 32] strides = [1,2,2,1] conv5 = deconv2d(conv4, cnn_weights['wdc2'], cnn_biases['bdc2'], output_shape, strides) # conv5 = tf.nn.local_response_normalization(conv5) with tf.name_scope("deconv3") as scope: output_shape = [temp_batch_size, 396, 396, 1] #this time don't use ReLu -- since output layer conv6 = tf.nn.conv2d_transpose(conv5, cnn_weights['wdc3'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID") x = tf.nn.bias_add(conv6, cnn_biases['bdc3']) # Include dropout #conv6 = tf.nn.dropout(conv6, dropout) x = tf.reshape(conv6, [-1, n_input_x, n_input_y]) # Prepare data shape to match `rnn` function requirements # Current data input shape: (batch_size, n_steps, n_input) # Permuting batch_size and n_steps x = tf.transpose(x, [1, 0, 2]) # Reshaping to (n_steps*batch_size, n_input) x = tf.reshape(x, [-1, n_input_x]) # Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden) # This input shape is required by `rnn` function x = tf.split(0, n_steps, x) # Define a lstm cell with tensorflow lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0, state_is_tuple=True, activation=tf.nn.relu) # lstm_cell = rnn_cell.MultiRNNCell([lstm_cell] * 12, state_is_tuple=True) # lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.8) outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32) # Linear activation, using rnn inner loop last output # pdb.set_trace() output = [] for i in xrange(396): output.append(tf.matmul(outputs[i], lstm_weights[i]) + lstm_biases[i]) return output cnn_weights = { # 5x5 conv, 1 input, 32 outputs 'wc1' : tf.Variable(tf.random_normal([5, 5, 1, 32])), # 5x5 conv, 32 inputs, 64 outputs 'wc2' : tf.Variable(tf.random_normal([5, 5, 32, 64])), # 5x5 conv, 32 inputs, 64 outputs 'wc3' : tf.Variable(tf.random_normal([5, 5, 64, 128])), 'wdc1' : tf.Variable(tf.random_normal([2, 2, 64, 128])), 'wdc2' : tf.Variable(tf.random_normal([2, 2, 32, 64])), 'wdc3' : tf.Variable(tf.random_normal([2, 2, 1, 32])), } cnn_biases = { 'bc1': tf.Variable(tf.random_normal([32])), 'bc2': tf.Variable(tf.random_normal([64])), 'bc3': tf.Variable(tf.random_normal([128])), 'bdc1': tf.Variable(tf.random_normal([64])), 'bdc2': tf.Variable(tf.random_normal([32])), 'bdc3': tf.Variable(tf.random_normal([1])), } lstm_weights = {} lstm_biases = {} for i in xrange(396): lstm_weights[i] = tf.Variable(tf.random_normal([n_hidden, n_output])) lstm_biases[i] = tf.Variable(tf.random_normal([n_output])) # Construct model # with tf.name_scope("net") as scope: pred = net(x, cnn_weights, cnn_biases, keep_prob) # pdb.set_trace() pred = tf.pack(pred) pred = tf.transpose(pred, [1,0,2]) pred = tf.reshape(pred, [-1, n_input_x * n_input_y]) with tf.name_scope("opt") as scope: # cost = tf.reduce_sum(tf.square(y-pred)) cost = tf.reduce_sum(tf.pow((pred-y),2)) / (2*batch_size) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # Evaluate model with tf.name_scope("acc") as scope: # accuracy is the difference between prediction and ground truth matrices correct_pred = tf.equal(0,tf.cast(tf.sub(cost,y), tf.int32)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # Initializing the variables init = tf.initialize_all_variables() saver = tf.train.Saver() # Launch the graph with tf.Session() as sess: sess.run(init) summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph) #initialize graph for tensorboard step = 1 # Import data data = scroll_data.read_data('/home/kendall/Desktop/') # Keep training until reach max iterations while step * batch_size < training_iters: batch_x, batch_y = data.train.next_batch(batch_size) # Run optimization op (backprop) # pdb.set_trace() batch_x = batch_x.reshape((batch_size, n_input_x, n_input_y)) batch_y = batch_y.reshape(batch_size, n_input_x * n_input_y) sess.run(optimizer, feed_dict={x: batch_x, y: batch_y}) step = step + 1 if step % display_step == 0: batch_y = batch_y.reshape(batch_size, n_input_x * n_input_y) loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x, y: batch_y}) # Make prediction im = Image.open('/home/kendall/Desktop/cropped/temp data0001.tif') batch_x = np.array(im) batch_x = batch_x.reshape((1, n_input_x, n_input_y)) batch_x = batch_x.astype(float) prediction = sess.run(pred, feed_dict={x: batch_x}) prediction = prediction.reshape((1, n_input_x * n_input_y)) prediction = tf.nn.softmax(prediction) prediction = prediction.eval() prediction = prediction.reshape((n_input_x, n_input_y)) # my_accuracy = accuracy_custom(temp_arr1,batch_y[0,:,:,0]) # # print "Step = " + str(step) + " | Accuracy = " + str(my_accuracy) print "Step = " + str(step) + " | Accuracy = " + str(acc) # csv_file = "CNN-LSTM-reg/CNNLSTMreg-step-" + str(step) + "-accuracy-" + str(my_accuracy) + ".csv" csv_file = "CNN-LSTM-reg/CNNLSTMreg-step-" + str(step) + "-accuracy-" + str(acc) + ".csv" np.savetxt(csv_file, prediction, delimiter=",")

Answer:

As said in the comments, a good weight initialization is key to the success of a model:

- too high: the model will not learn and may produce NaN values
- too low: the model will learn very very slowly, because the gradient will be too small (see vanishing gradients)

There are good initializations already provided in TensorFlow here (as a contribution), feel free to use them.

Question:

I'm studying about neural networks and I came across the following **xor**cise (*that was a joke!*) question.

I asked my friend if I need to implement the perceptron algorithm to solve it and he said "no- just think about it". Well, I thought about it, but my small monkey brain was only able to come up with the following:

My friends words make me think it's a trick question, and the only trick we've discussed thus far is the inability of perceptron to do an XOR function.

Is that was this question is getting at?

How to solve this problem?...

A simple Perzeptron with two inputs x₁, x₂, BIAS, and transfer function y = f(z) = sgn(z), separates the two dimensional input space into two parts with help of a line g. Calculate for this Perzeptron the weights w1, w2, wb, so that the line separates the given 6 patterns (pX1, pX2; Py) into two classes: 1X = (0, 0; -1), 2X = (2, -2; +1), 3X = (7 + ε, 3 - ε; +1), 4X = (7 - ε, 3 + ε; -1), 5X = (0, -2 - ε; +1), 6X = (0 - ε, -2; -1), Remark: 0 < ε << 1.

Answer:

If you graph the points, you'll see that all the -1s are on the top left side, and all the +1s are on the bottom right. You can draw a line intersecting (0, -2) and (7,3) which gives you the expression:

y = 5x/7 - 2

which is enough to skip running through any algorithm.

The equation of the line to predict +1 occurences is given by:

y < 5x/7 - 2

The line above splits the 2 dimensional space in two. The shaded area is BELOW the line, and the line goes up and to the right. So for any arbitrary point, you just have to figure out if it's in the shaded area (positive prediction = +1).

Say, (pX1, pX2) = (35, 100),

1) one way is to plug pX1 back in the formula (for x' = pX1) to find the closest part on the line (where y=5x/7-2):

y' = 5(35)/7 - 2 y' = 23

Since the point on the line is (35, 23) and the point we are interested is (35, 100), it is above the line. In other words, pX2 is NOT < 23, the prediction returns -1.

2) plot y'=100, so

100 = 5x/7-2 x = 142.8

Line point=(142.8, 100), your point (35, 100), our point is on the left of the line point, it still falls outside the shaded area.

3) You can even graph it and visually check if it's in the shaded area

The point is some calculation has to be done to check if it's IN or OUT. That's the point of linear regression. It should be really simple for the machine to figure out because you're just calculating one thing. Predicting once you have your formulas should be quick. The hardest part is determining the formula for the line, which we've already done by graphing the points and seeing an obvious solution. If you use machine learning, it will take longer in this case.

Question:

I am using tensor flow library to build a pretty simple 2 layer artificial neural network to perform linear regression. My problem is that the results seem to be far from expected. I've been trying to spot my mistake for hours but no hope. I am new to tensor flow and neural networks so it could be a trivial mistake. Could anyone have an idea what i am doing wrong?

from __future__ import print_function import tensorflow as tf import numpy as np # Python optimisation variables learning_rate = 0.02 data_size=100000 data_length=100 train_input=10* np.random.rand(data_size,data_length); train_label=train_input.sum(axis=1); train_label=np.reshape(train_label,(data_size,1)); test_input= np.random.rand(data_size,data_length); test_label=test_input.sum(axis=1); test_label=np.reshape(test_label,(data_size,1)); x = tf.placeholder(tf.float32, [data_size, data_length]) y = tf.placeholder(tf.float32, [data_size, 1]) W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1') b1 = tf.Variable(tf.random_normal([data_size, 1]), name='b1') y_ = tf.add(tf.matmul(x, W1), b1) cost = tf.reduce_mean(tf.square(y-y_)) optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate) .minimize(cost) init_op = tf.global_variables_initializer() correct_prediction = tf.reduce_mean(tf.square(y-y_)) accuracy = tf.cast(correct_prediction, tf.float32) with tf.Session() as sess: sess.run(init_op) _, c = sess.run([optimiser, cost], feed_dict={x:train_input , y:train_label}) k=sess.run(b1) print(k) print(sess.run(accuracy, feed_dict={x: test_input, y: test_label}))

Thanks for your help!

Answer:

There are a number of changes you have to make in your code.

First of all, you have to perform training for number of epochs and also feed the optimizer training data in batches. Your learning rate was very high. Bias is supposed to be only one input for every dense (fully connected) layer. You can plot the cost (loss) value to see how your network is converging. In order to feed data in batches, I have made the changes in placeholders also. Check the full modified code:

from __future__ import print_function import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # Python optimisation variables learning_rate = 0.001 data_size=1000 # Had to change these value to fit in my memory data_length=10 train_input=10* np.random.rand(data_size,data_length); train_label=train_input.sum(axis=1); train_label=np.reshape(train_label,(data_size,1)); test_input= np.random.rand(data_size,data_length); test_label=test_input.sum(axis=1); test_label=np.reshape(test_label,(data_size,1)); tf.reset_default_graph() x = tf.placeholder(tf.float32, [None, data_length]) y = tf.placeholder(tf.float32, [None, 1]) W1 = tf.Variable(tf.random_normal([data_length, 1], stddev=0.03), name='W1') b1 = tf.Variable(tf.random_normal([1, 1]), name='b1') y_ = tf.add(tf.matmul(x, W1), b1) cost = tf.reduce_mean(tf.square(y-y_)) optimiser=tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost) init_op = tf.global_variables_initializer() EPOCHS = 500 BATCH_SIZE = 32 with tf.Session() as sess: sess.run(init_op) loss_history = [] for epoch_no in range(EPOCHS): for offset in range(0, data_size, BATCH_SIZE): batch_x = train_input[offset: offset + BATCH_SIZE] batch_y = train_label[offset: offset + BATCH_SIZE] _, c = sess.run([optimiser, cost], feed_dict={x:batch_x , y:batch_y}) loss_history.append(c) plt.plot(range(len(loss_history)), loss_history) plt.show() # For running test dataset results, test_cost = sess.run([y_, cost], feed_dict={x: test_input, y: test_label}) print('test cost: {:.3f}'.format(test_cost)) for t1, t2 in zip(results, test_label): print('Prediction: {:.3f}, actual: {:.3f}'.format(t1[0], t2[0]))

Question:

import numpy as np np.random.seed(0) a = np.random.randint(1,100, size= 1000).reshape(1000,1) b = np.random.randint(0,2, size=1000).reshape(1000,1) y = np.where(b==0,a*2, a*3) X = np.hstack((a,b)) y = y from sklearn.preprocessing import StandardScaler sx = StandardScaler() X = sx.fit_transform(X) sy = StandardScaler() y = sy.fit_transform(y) w0 = np.random.normal(size=(2,1), scale=0.1) for i in range(100): input_layer = X output_layer = X.dot(w0) error = y - output_layer square_error = np.sqrt(np.mean(error**2)) print(square_error) w0+= input_layer.T.dot(error)

If i understand correctly, linear activation function is always f(x) = x.

If you check this code, you'll see square error is growing and growing, I have no idea how to solve this simple linear problem with NN. I am aware there are other models and libraries, however I am trying to do it this way.

Answer:

You did not incorporate learning rate (see here and a more formal discussion here) into your model. When you train your network, you need to choose a learning rate parameter as well, and it has a big impact on whether your loss will decrease and how fast it converges.

By setting

w0+= input_layer.T.dot(error)

you chose the learning rate to be 1, which turned out to be too large. If instead you set

w0+= 0.0005*input_layer.T.dot(error)

(that is, choose learning rate 0.0005) the loss will decrease:

1.0017425183 0.521060951473 0.303777564629 0.21993949808 0.193933601196 0.18700323975 0.185262617455 0.184832603515 0.184726763539 . . .

It won't converge to 0 though, as your model is not linear. In the end the weight `w0`

that you get is

array([[ 0.92486712], [ 0.318241 ]])

Question:

import pandas as pd import matplotlib.pyplot as plt csv = 'C:\\Users\\Alex\\Downloads\\weight-height.csv' df = pd.read_csv(csv) df.head x_train = df['Height'].values #into centimetres because im english x_train = x_train * 2.54 y_train = df['Weight'].values #into kilos because im english y_train = y_train / 2.2046226218 plt.figure() plt.scatter(x_train, y_train, c=None) plt.show() print(X[:10]) print(y[:10]) from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Activation import numpy as np X = np.array(x_train).reshape(-1,1) y = np.array(y_train).reshape(-1,1) X = X[:5000] y = y[:5000] model = Sequential() model.add(Dense(36, activation='relu')) model.add(Dense(18)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy']) history = model.fit(X,y, batch_size=1, epochs=1, validation_split=0.1) #plt.plot(history.history['acc']) #plt.plot(history.history['val_acc'])

My problem is pretty much that I'm a noob and I'm trying to create my own linear regression model from scratch using keras and I can't understand why my loss is so high. I need to know if its the optimizer or loss function I'm using or a data problem. The dataset is simply a list of weights and heights.

Answer:

I would try:

Normalising your heights and weights so that the maximum of each is one. Deep learning is generally easier when values are about one. Obviously you need to do the same division to your test data, and then multiply the answers by the same amount at the end.

Changing your metric to 'mse' or 'mae' (mean squared error or mean abs error). This won't change your loss, but will make you feel better as it is a more meaningful measure of how well you are doing.

Try this:

x_train = df['Height'].values x_train = x_train * 2.54 x_train = x_train / 175.0 y_train = df['Weight'].values y_train = y_train / 2.2046226218 y_train = y_train / 80.0

...

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mse'])

And to test some new values

x_test = 187 x_test = np.array(x_test).reshape(-1,1) x_test = x_test * 2.54 x_test = x_test / 175.0 pred = model.predict(x_test) pred = pred * 80.0

Question:

I want to use a neural network to optimize an energy function to improve regression with rmsd. My energy function has 16 terms and I want to optimize the weights before summing it.

http://www.mathworks.com/help/nnet/gs/fit-data-with-a-neural-network.html has an example and my problem is something similar but I want to implement it in python.

Can someone please give me any pointers to where I can find similar examples/what modules I should use?

Answer:

I would also recommend pybrain. pybrain.tools.neuralnets.NNregression is a tool that will help you in regression. Here is an example that will help you get started.

Question:

Ive got a linear regression model in ML.NET and the predictions are working fine:

MLContext mlContext = new MLContext(seed: 0); List<TwoInputRegressionModel> inputs = new List<TwoInputRegressionModel>(); foreach (var JahrMitCO in ListWithCO) { float tempyear = JahrMitCO.Year; foreach (var JahrMitPopulation in Population) { if (JahrMitPopulation.Year == tempyear) { inputs.Add(new TwoInputRegressionModel() { Year = tempyear, Population = JahrMitPopulation.Value, Co2 = JahrMitCO.Value }); } } } var model = Train(mlContext, inputs); TestSinglePrediction(mlContext, model); //works

But I would like to know how to gain access to the parameters (weights + bias) of the trained model? I do know that the ITransformer class (here called model)does contain a Model property, but trying to convert it to the LinearRegressionModelParameters class like stated on the documentation doesnt work:

LinearRegressionModelParameters originalModelParameters = ((ISingleFeaturePredictionTransformer<object>)model).Model as LinearRegressionModelParameters; //Exception:System.InvalidCastException

The object of the type Microsoft.ML.Data.TransformerChain

`1[Microsoft.ML.Data.RegressionPredictionTransformer`

1[Microsoft.ML.Trainers.FastTree.FastTreeRegressionModelParameters]] can not be converted to Microsoft.ML.ISingleFeaturePredictionTransformer`1[System.Object]

How to access the model parameters?

Answer:

The problem in this case is that your `model`

object isn't a `ISingleFeaturePredictionTransformer`

, but instead it is a `TransformerChain`

object (i.e. a chain of transformers), where the `LastTransformer`

is the "prediction transformer".

So to fix this, first cast your `model`

to `TransformerChain<RegressionPredictionTransformer<FastTreeRegressionModelParameters>>`

, then you can get `LastTransformer`

, which will return the `RegressionPredictionTransformer<FastTreeRegressionModelParameters>`

, from there you can get the `Model`

property.

If you don't happen to know at compile time which concrete type of transformer the TransformerChain will contain, you can cast `model`

to `IEnumerable<ITransformer>`

and get the `.Last()`

transformer in the chain. That you can cast to `ISingleFeaturePredictionTransformer<object>`

in order to get the `Model`

property.

ITransformer model = ...; IEnumerable<ITransformer> chain = model as IEnumerable<ITransformer>; ISingleFeaturePredictionTransformer<object> predictionTransformer = chain.Last() as ISingleFeaturePredictionTransformer<object>; object modelParameters = predictionTransformer.Model;

From there you can cast `modelParameters`

to whatever specific `ModelParameters`

class it happens to be.

Note: from your exception message, it doesn't appear you are training a linear regression model, but instead a fast tree model. Tree-based models won't be able to be casted as `LinearRegressionModelParameters`

as so you won't see bias and weights, but instead you will see tree information.

Question:

Being new to deep learning world and after reading a lot of theories, I was trying to understand how does the neural net learn practically, so I provided it with simple dataset where input columns are [[x,x+y]...] and output column [[x/(x+y)]...] and tried it using tflearn but even after countless tries(2 days) the network is not able to minimize loss. Even after it minimizes(after adding tanh layer) the network predictions are way off. Can someone help me with this? Below is the code.

neural_net = tflearn.input_data(shape=[None, 2]) neural_net = tflearn.fully_connected(neural_net, 1, activation='linear', bias=True) neural_net = tflearn.regression(neural_net, optimizer='sgd', learning_rate=0.001, loss='mean_square') # train model = tflearn.DNN(neural_net,tensorboard_verbose=3) model.fit(OR, Y_truth, n_epoch=1000, snapshot_epoch=False,validation_set=0.1,batch_size=10) print model.get_weights(neural_net.W) # prediction print model.predict([[2,2]])

The output prediction is 0.13946463!!

Answer:

Your network won't be able to learn that function.

The network has a single neuron. That's equivalent to ax+by+c where you try to learn a,b,c.

You need more non-linearity in your model. You need to add more layers and neurons with non-linear activation for the model to be able to learn your desired function.