## Hot questions for Using Neural networks in scipy

Question:

##### Background

I've trained up a convolutional neural network which I would like others to be able to use without requiring hard to install libraries such as Theano (which I have found trivial to install on Linux, but very hard on Windows).

I've written an implementation using Numpy/Scipy that is almost fast enough, but would be even better if it was two or three times faster.

##### What I've tried

90% of the time is spent in the following line:

conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0)

This line gets called 32 times (once for each feature map), and num_in is 16 (the number of features in the previous layer). So overall this line is slow as it results in 32*16=512 calls to the convolve2d routine.

x[i] is only 25*25, and W[f][i] is 2*2.

##### Question

Is there a better way of expressing this type of convolutional layer in Numpy/Scipy that would execute faster?

(I am only using this code for applying a learnt network so I do not have lots of images to be done in parallel.)

##### Code

The full code for doing a timing experiment is:

import numpy as np import scipy.signal from time import time def max_pool(x): """Return maximum in groups of 2x2 for a N,h,w image""" N,h,w = x.shape return np.amax([x[:,(i>>1)&1::2,i&1::2] for i in range(4)],axis=0) def conv_layer(params,x): """Applies a convolutional layer (W,b) followed by 2*2 pool followed by RelU on x""" W,biases = params num_in = W.shape[1] A = [] for f,bias in enumerate(biases): conv_out = np.sum([scipy.signal.convolve2d(x[i],W[f][i],mode='valid') for i in range(num_in)], axis=0) A.append(conv_out + bias) x = np.array(A) x = max_pool(x) return np.maximum(x,0) W = np.random.randn(32,16,2,2).astype(np.float32) b = np.random.randn(32).astype(np.float32) I = np.random.randn(16,25,25).astype(np.float32) t0 = time() O = conv_layer((W,b),I) print time()-t0

This prints 0.084 seconds at the moment.

##### Update

Using mplf's suggestion:

d = x[:,:-1,:-1] c = x[:,:-1,1:] b = x[:,1:,:-1] a = x[:,1:,1:] for f,bias in enumerate(biases): conv_out = np.sum([a[i]*W[f,i,0,0]+b[i]*W[f,i,0,1]+c[i]*W[f,i,1,0]+d[i]*W[f,i,1,1] for i in range(num_in)], axis=0)

I get 0.075s, which is slightly faster.

Answer:

##### Accelerating convolution

Building on mplf's suggestion I've found it is possible to remove both of the for loops and the call to convolve2d:

d = x[:,:-1,:-1].swapaxes(0,1) c = x[:,:-1,1:].swapaxes(0,1) b = x[:,1:,:-1].swapaxes(0,1) a = x[:,1:,1:].swapaxes(0,1) x = W[:,:,0,0].dot(a) + W[:,:,0,1].dot(b) + W[:,:,1,0].dot(c) + W[:,:,1,1].dot(d) + biases.reshape(-1,1,1)

This is 10 times faster than the original code.

##### Accelerating max pool

With this new code, the max pool stage now takes 50% of the time. This can also be sped up by using:

def max_pool(x): """Return maximum in groups of 2x2 for a N,h,w image""" N,h,w = x.shape x = x.reshape(N,h/2,2,w/2,2).swapaxes(2,3).reshape(N,h/2,w/2,4) return np.amax(x,axis=3)

This speeds up the max_pool step by a factor of 10, so overall the program doubles in speed again.

Question:

I would like to train a feed forward neural network implemented in Keras using BFGS. To see if it could be done, I implemented a Perceptron using `scipy.optimize.minimize`

, with the code below.

from __future__ import print_function import numpy as np from scipy.optimize import minimize from keras.models import Sequential from keras.layers.core import Dense # Dummy training examples X = np.array([[-1,2,-3,-1],[3,2,-1,-4]]).astype('float') Y = np.array([[2],[-1]]).astype('float') model = Sequential() model.add(Dense(1, activation='sigmoid', input_dim=4)) def loss(W): weightsList = [np.zeros((4,1)), np.zeros(1)] for i in range(4): weightsList[0][i,0] = W[i] weightsList[1][0] = W[4] model.set_weights(weightsList) preds = model.predict(X) mse = np.sum(np.square(np.subtract(preds,Y)))/len(X[:,0]) return mse # Dummy first guess V = [1.0, 2.0, 3.0, 4.0, 1.0] res = minimize(loss, x0=V, method = 'BFGS', options={'disp':True}) print(res.x)

However, the output of this shows that the loss function does not optimize:

Using Theano backend. Using gpu device 0: GeForce GTX 960M (CNMeM is disabled, cuDNN not available) Optimization terminated successfully. Current function value: 2.499770 Iterations: 0 Function evaluations: 7 Gradient evaluations: 1 [ 1. 2. 3. 4. 1.]

Any ideas why this didn't work? Is it because I didn't input the gradient to `minimize`

, and it cannot calculate the numerical approximation in this case?

Answer:

Is it because I didn't input the gradient to minimize, and it cannot calculate the numerical approximation in this case?

It's because you don't output the gradients, so scipy approximates them by numerical differentiation. That is it evaluate the function at X, then at X + epsilon, to approximate the local gradient.

But the epsilon is small enough that in the conversion to 32bit for theano, the change is completely lost. The starting guess is not in fact a minimum, scipy just thinks so since it sees no change in value in the objective function. You simply need to increase the epsilon as such:

V = [1.0, 2.0, 3.0, 4.0, 1.0] print('Starting loss = {}'.format(loss(V))) # set the eps option to increase the epsilon used in numerical diff res = minimize(loss, x0=V, method = 'BFGS', options={'eps':1e-6,'disp':True}) print('Ending loss = {}'.format(loss(res.x)))

Which gives:

Using Theano backend. Starting loss = 2.49976992001 Optimization terminated successfully. Current function value: 1.002703 Iterations: 19 Function evaluations: 511 Gradient evaluations: 73 Ending loss = 1.00270344184

Question:

Trying to use Backpropagation Neural Network for multiclass classification. I have found this code and try to adapt it. It is based on the lections of Machine Learning in Coursera from Andrew Ng.

I don't understand exactly the implementation of `scipy.optimize.minimize`

function here. It is used just once in the code. Is it iteratively updating the weights of the network? How can I visualize (plot) it's performance to see when it converges?

Using this function what parameters I can adjust to achieve better performance? I found here a list common parameters:

- Number of neurons in the hidden layer: this is
`hidden_layer_size=25`

in my code **Learning rate: can I still adjust that using built-in minimization function?****Momentum:**is that`reg_lambda=0`

in my case? Regularization parameter to avoid overfitting, right?- Epoch:
`maxiter=500`

Here is my training data (target class is in the last column):

65535, 3670, 65535, 3885, -0.73, 1 65535, 3962, 65535, 3556, -0.72, 1 65535, 3573, 65535, 3529, -0.61, 1 3758, 3123, 4117, 3173, -0.21, 0 3906, 3119, 4288, 3135, -0.28, 0 3750, 3073, 4080, 3212, -0.26, 0 65535, 3458, 65535, 3330, -0.85, 2 65535, 3315, 65535, 3306, -0.87, 2 65535, 3950, 65535, 3613, -0.84, 2 65535, 32576, 65535, 19613, -0.35, 3 65535, 16657, 65535, 16618, -0.37, 3 65535, 16657, 65535, 16618, -0.32, 3

The dependencies are so obvious, I think it should be so easy to classify it...

But results are terrible. I get accuracy of 0.6 to 0.8. This is absolutely inappropriate for my application. I know I need more data normally, but I would be already happy when I could at least fit the training data (without taking into account potential overfitting)

Here is the code:

import numpy as np from scipy import optimize from sklearn import cross_validation from sklearn.metrics import accuracy_score import math class NN_1HL(object): def __init__(self, reg_lambda=0, epsilon_init=0.12, hidden_layer_size=25, opti_method='TNC', maxiter=500): self.reg_lambda = reg_lambda self.epsilon_init = epsilon_init self.hidden_layer_size = hidden_layer_size self.activation_func = self.sigmoid self.activation_func_prime = self.sigmoid_prime self.method = opti_method self.maxiter = maxiter def sigmoid(self, z): return 1 / (1 + np.exp(-z)) def sigmoid_prime(self, z): sig = self.sigmoid(z) return sig * (1 - sig) def sumsqr(self, a): return np.sum(a ** 2) def rand_init(self, l_in, l_out): self.epsilon_init = (math.sqrt(6))/(math.sqrt(l_in + l_out)) return np.random.rand(l_out, l_in + 1) * 2 * self.epsilon_init - self.epsilon_init def pack_thetas(self, t1, t2): return np.concatenate((t1.reshape(-1), t2.reshape(-1))) def unpack_thetas(self, thetas, input_layer_size, hidden_layer_size, num_labels): t1_start = 0 t1_end = hidden_layer_size * (input_layer_size + 1) t1 = thetas[t1_start:t1_end].reshape((hidden_layer_size, input_layer_size + 1)) t2 = thetas[t1_end:].reshape((num_labels, hidden_layer_size + 1)) return t1, t2 def _forward(self, X, t1, t2): m = X.shape[0] ones = None if len(X.shape) == 1: ones = np.array(1).reshape(1,) else: ones = np.ones(m).reshape(m,1) # Input layer a1 = np.hstack((ones, X)) # Hidden Layer z2 = np.dot(t1, a1.T) a2 = self.activation_func(z2) a2 = np.hstack((ones, a2.T)) # Output layer z3 = np.dot(t2, a2.T) a3 = self.activation_func(z3) return a1, z2, a2, z3, a3 def function(self, thetas, input_layer_size, hidden_layer_size, num_labels, X, y, reg_lambda): t1, t2 = self.unpack_thetas(thetas, input_layer_size, hidden_layer_size, num_labels) m = X.shape[0] Y = np.eye(num_labels)[y] _, _, _, _, h = self._forward(X, t1, t2) costPositive = -Y * np.log(h).T costNegative = (1 - Y) * np.log(1 - h).T cost = costPositive - costNegative J = np.sum(cost) / m if reg_lambda != 0: t1f = t1[:, 1:] t2f = t2[:, 1:] reg = (self.reg_lambda / (2 * m)) * (self.sumsqr(t1f) + self.sumsqr(t2f)) J = J + reg return J def function_prime(self, thetas, input_layer_size, hidden_layer_size, num_labels, X, y, reg_lambda): t1, t2 = self.unpack_thetas(thetas, input_layer_size, hidden_layer_size, num_labels) m = X.shape[0] t1f = t1[:, 1:] t2f = t2[:, 1:] Y = np.eye(num_labels)[y] Delta1, Delta2 = 0, 0 for i, row in enumerate(X): a1, z2, a2, z3, a3 = self._forward(row, t1, t2) # Backprop d3 = a3 - Y[i, :].T d2 = np.dot(t2f.T, d3) * self.activation_func_prime(z2) Delta2 += np.dot(d3[np.newaxis].T, a2[np.newaxis]) Delta1 += np.dot(d2[np.newaxis].T, a1[np.newaxis]) Theta1_grad = (1 / m) * Delta1 Theta2_grad = (1 / m) * Delta2 if reg_lambda != 0: Theta1_grad[:, 1:] = Theta1_grad[:, 1:] + (reg_lambda / m) * t1f Theta2_grad[:, 1:] = Theta2_grad[:, 1:] + (reg_lambda / m) * t2f return self.pack_thetas(Theta1_grad, Theta2_grad) def fit(self, X, y): num_features = X.shape[0] input_layer_size = X.shape[1] num_labels = len(set(y)) theta1_0 = self.rand_init(input_layer_size, self.hidden_layer_size) theta2_0 = self.rand_init(self.hidden_layer_size, num_labels) thetas0 = self.pack_thetas(theta1_0, theta2_0) options = {'maxiter': self.maxiter} _res = optimize.minimize(self.function, thetas0, jac=self.function_prime, method=self.method, args=(input_layer_size, self.hidden_layer_size, num_labels, X, y, 0), options=options) self.t1, self.t2 = self.unpack_thetas(_res.x, input_layer_size, self.hidden_layer_size, num_labels) np.savetxt("weights_t1.txt", self.t1, newline="\n") np.savetxt("weights_t2.txt", self.t2, newline="\n") def predict(self, X): return self.predict_proba(X).argmax(0) def predict_proba(self, X): _, _, _, _, h = self._forward(X, self.t1, self.t2) return h ################## # IR data # ################## values = np.loadtxt('infrared_data.txt', delimiter=', ', usecols=[0,1,2,3,4]) targets = np.loadtxt('infrared_data.txt', delimiter=', ', dtype=(int), usecols=[5]) X_train, X_test, y_train, y_test = cross_validation.train_test_split(values, targets, test_size=0.4) nn = NN_1HL() nn.fit(values, targets) print("Accuracy of classification: "+str(accuracy_score(y_test, nn.predict(X_test))))

Answer:

In the given code `scipy.optimize.minimize`

iteratively minimizes function given it's derivative (Jacobi's matrix). According to the documentation, use can specify `callback`

argument to a function that will be called after each iteration — this will let you measure performance, though I'm not sure if it'll let you halt the optimization process.

All parameters you listed are hyperparameters, it's hard to optimize them directly:

*Number of neurons in the hidden layer* is a discrete valued parameters, and, thus, is not optimizable via gradient techniques. Moreover, it affects NeuralNet architecture, so you can't optimize it while training the net. What you can do, though, is to use some higher-level routine to search for possible options, like exhaustive grid search with cross-validation (for example look at GridSearchCV) or other tools for hyperparameter search (hyperopt, spearmint, MOE, etc).

*Learning rate* does not seem to be customizable for most of the optimization methods available. But, actually, learning rate in gradient descent is just a Newton's method with Hessian "approximated" by `1 / eta I`

— diagonal matrix with inverted learning rates on the major diagonal. So you can try hessian-based methods with this heuristic.

*Momentum* is completely unrelated to regularization. It's an optimization technique, and, since you use scipy for optimization, is unavailable for you.

Question:

At the bottom of this question are a set of functions transcribed from a published neural-network model. When I call `R`

, I get the following error:

RuntimeError: maximum recursion depth exceeded while calling a Python object

Note that within each call to `R`

, a recursive call to `R`

is made for every other neuron in the network. This is what causes the recursion depth to be exceeded. Each return value for `R`

depends on all the others (with the network involving `N = 512`

total values.) Does anyone have any idea what method should be used to compute the self-consistent solution for `R`

? Note that `R`

itself is a smooth function. I've tried treating this as a vector root-solving problem -- but in this case the 512 dimensions are not independent. With so many degrees of freedom, the roots are never found (using the `scipy.optimize`

functions). Does Python have any tools that can help with this? Maybe it would be more natural to solve `R`

using something like Mathematica? I don't know how this is normally done.

"""Recurrent model with strong excitatory recurrence.""" import numpy as np l = 3.14 def R(x_i): """Steady-state firing rate of neuron at location x_i. Parameters ---------- x_i : number Location of this neuron. Returns ------- rate : float Firing rate. """ N = 512 T = 1 x = np.linspace(-2, 2, N) sum_term = 0 for x_j in x: sum_term += J(x_i - x_j) * R(x_j) rate = I_S(x_i) + I_A(x_i) + 1.0 / N * sum_term - T if rate < 0: return 0 return rate def I_S(x): """Sensory input. Parameters ---------- x : number Location of this neuron. Returns ------- float Sensory input to neuron at x. """ S_0 = 0.46 S_1 = 0.66 x_S = 0 sigma_S = 1.31 return S_0 + S_1 * np.exp(-0.5 * (x - x_S) ** 2 / sigma_S ** 2) def I_A(x): """Attentional additive bias. Parameters ---------- x : number Location of this neuron. Returns ------- number Additive bias for neuron at x. """ x_A = 0 A_1 = 0.089 sigma_A = 0.35 A_0 = 0 sigma_A_prime = 0.87 if np.abs(x - x_A) < l: return (A_1 * np.exp(-0.5 * (x - x_A) ** 2 / sigma_A ** 2) + A_0 * np.exp(-0.5 * (x - x_A) ** 2 / sigma_A_prime ** 2)) return 0 def J(dx): """Connection strength. Parameters ---------- dx : number Neuron i's distance from neuron j. Returns ------- number Connection strength. """ J_0 = -2.5 J_1 = 8.5 sigma_J = 1.31 if np.abs(dx) < l: return J_0 + J_1 * np.exp(-0.5 * dx ** 2 / sigma_J ** 2) return 0 if __name__ == '__main__': pass

Answer:

This recursion never ends since there is no termination condition before recursive call, adjusting maximum recursion depth does not help

def R(x_i): ... for x_j in x: sum_term += J(x_i - x_j) * R(x_j)

Perhaps you should be doing something like

# some suitable initial guess state = guess while True: # or a fixed number of iterations next_state = compute_next_state(state) if some_condition_check(state, next_state): # return answer return state if some_other_check(state, next_state): # something wrong, terminate raise ...

Question:

Just a bit of context: I'm attempting to implement a 3 layer neural network (1 hidden layer) for image classification on the Cifar-10 dataset. I've implemented backpropagation and originally tried training the network simply using gradient descent, but my cost was plateauing around 40 or so (which effectively classifies new images at the same rate of guessing randomly; virtually pointless).

I then attempted to train the network using the `scipy.optimize.fmin_cg`

function. I passed the unrolled weights into the function, and my backpropagation function returns the same size vector for the gradients, which satisfy the input requirements for the function.

My implementation of the function looks like the following:

scipy.optimize.fmin_cg(cost, iw, fprime=fit_slim)

Where the `fit`

and `fit_slim`

functions are the following:

def fit(X, Y, w, l, predict=False, x=None): w_grad = ([np.mat(np.zeros(np.shape(w[i]))) for i in range(len(w))]) for i in range(len(X)): x = x if predict else X[i] y = Y[i] # forward propagate a = x a_s = [] for j in range(len(w)): a = np.mat(np.append(1, a)).T a_s.append(a) z = w[j] * a a = sigmoid(z) if predict: return a # backpropagate delta = a - y.T w_grad[-1] += delta * a_s[-1].T for j in reversed(range(1, len(w))): delta = delta[1:] if j != len(w)-1 else delta delta = np.multiply(w[j].T*delta, s_prime(a_s[j])) w_grad[j-1] += (delta[1:] * a_s[j-1].T) # regularization for i in range(len(w)): w_grad[i] /= len(X) w_grad[i][:,1:] += (l/len(X)) * w[i][:,1:] return flatten(w_grad).T def fit_slim(iw): iw = shape_back(iw) return fit(X, Y, iw, l)

And the `cost`

function is:

def cost(iw): J = 0 m = len(X) iw = shape_back(iw) for i in range(m): h = fit(X, Y, iw, l, True, X[i]) J += ((1.0/m)*(np.sum((np.multiply(-Y[i],np.log(h))- np.multiply((1-Y[i]),np.log(1-h))).flatten()))) for i in range(len(w)): J += np.sum(((l/(2.0*m))*np.power(w[i],2)).flatten()) return J

The `iw`

variable is the unrolled weights into a long vector, and the `shape_back`

function simply reshapes `iw`

back into its original matrix shape for use in the `fit`

and `cost`

functions.

The first issue I face is the fact my `fit`

function takes forever to run a single iteration. And by forever, I mean about a minute, which seems very slow. Nevertheless, I've let it run until the cost plateaus at around 40, as I mentioned, which is still a very high cost. That said, implementing an alternative optimization technique seemed reasonable to me, and I settled the fmin_cg function.

When I run it, I receive the following error:

File "image_net.py", line 143, in <module> print scipy.optimize.fmin_cg(cost, iw, fprime=fit_slim, maxiter=2, callback=callback) File "/Users/samgriesemer/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 1092, in fmin_cg res = _minimize_cg(f, x0, args, fprime, callback=callback, **opts) File "/Users/samgriesemer/anaconda/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 1156, in _minimize_cg deltak = numpy.dot(gfk, gfk) ValueError: shapes (616610,1) and (616610,1) not aligned: 1 (dim 1) != 616610 (dim 0)

It seems to me that the function is attempting to take the dot product of the same vector, which doesn't make any sense to me.

So to recap my question, I have two issues.

1) Is there anything I can do to better optimize my `fit`

function? My data set has 10,000 examples, so I understand that it takes time to loop through all of them, but it don't understand why, even after many iterations, my cost is still very high.

2) Why am I receiving an error when running the `fmin_cg`

function? The arguments I'm passing into the function are the same sized vectors. I don't understand why it would attempt to take the dot product of the same sized vector within the function.

Many thanks to anyone who can shed some light on these issues/misunderstandings.

Answer:

It seems to me that the function is attempting to take the dot product of the same vector, which doesn't make any sense to me.

This is not how `numpy.dot`

works. The problem is exactly what the error message says: it tries to perform a matrix multiplication and fails because the dimensions do not match.

Notice that for arrays which one could think of as "one-dimensional", numpy distinguishes between shapes `(n,)`

, `(n, 1)`

and `(1, n)`

: only the first one is *one-dimensional* for numpy, and it's *not* interpreted as a row or a column vector.

>>> a = np.ones(3) # a 1D array >>> np.dot(a, a) 3.0 >>> b = a.reshape(-1, 1) # a column vector >>> b array([[ 1.], [ 1.], [ 1.]]) >>> np.dot(b, b) # column times column, fails Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0) >>> np.dot(b, b.T) # column times row, i.e. an outer product array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]]) >>> np.dot(b.T, b) # row times column, but notice the dimensions array([[ 3.]])

Question:

I need to use Backpropagation Neural Netwrok for multiclass classification purposes in my application. I have found this code and try to adapt it to my needs. It is based on the lections of Machine Learning in Coursera from Andrew Ng. I have tested it in IRIS dataset and achieved good results (accuracy of classification around 0.96), whereas on my real data I get terrible results. I assume there is some implementation error, because the data is very simple. But I cannot figure out what exactly is the problem.

What are the parameters that it make sense to adjust? I tried with:

- number of units in hidden layer
- generalization parameter (lambda)
- number of iterations for minimization function

Built-in minimization function used in this code is pretty much confusing me. It is used just once, as @goncalopp has mentioned in comment. Shouldn't it iteratively update the weights? How it can be implemented?

Here is my training data (target class is in the last column):

65535, 3670, 65535, 3885, -0.73, 1 65535, 3962, 65535, 3556, -0.72, 1 65535, 3573, 65535, 3529, -0.61, 1 3758, 3123, 4117, 3173, -0.21, 0 3906, 3119, 4288, 3135, -0.28, 0 3750, 3073, 4080, 3212, -0.26, 0 65535, 3458, 65535, 3330, -0.85, 2 65535, 3315, 65535, 3306, -0.87, 2 65535, 3950, 65535, 3613, -0.84, 2 65535, 32576, 65535, 19613, -0.35, 3 65535, 16657, 65535, 16618, -0.37, 3 65535, 16657, 65535, 16618, -0.32, 3

The dependencies are so obvious, I think it should be so easy to classify it...

But results are terrible. I get accuracy of 0.6 to 0.8. This is absolutely inappropriate for my application. Can someone please point out possible improvements I could make in order to achieve better results.

Here is the code:

import numpy as np from scipy import optimize from sklearn import cross_validation from sklearn.metrics import accuracy_score import math class NN_1HL(object): def __init__(self, reg_lambda=0, epsilon_init=0.12, hidden_layer_size=25, opti_method='TNC', maxiter=500): self.reg_lambda = reg_lambda self.epsilon_init = epsilon_init self.hidden_layer_size = hidden_layer_size self.activation_func = self.sigmoid self.activation_func_prime = self.sigmoid_prime self.method = opti_method self.maxiter = maxiter def sigmoid(self, z): return 1 / (1 + np.exp(-z)) def sigmoid_prime(self, z): sig = self.sigmoid(z) return sig * (1 - sig) def sumsqr(self, a): return np.sum(a ** 2) def rand_init(self, l_in, l_out): self.epsilon_init = (math.sqrt(6))/(math.sqrt(l_in + l_out)) return np.random.rand(l_out, l_in + 1) * 2 * self.epsilon_init - self.epsilon_init def pack_thetas(self, t1, t2): return np.concatenate((t1.reshape(-1), t2.reshape(-1))) def unpack_thetas(self, thetas, input_layer_size, hidden_layer_size, num_labels): t1_start = 0 t1_end = hidden_layer_size * (input_layer_size + 1) t1 = thetas[t1_start:t1_end].reshape((hidden_layer_size, input_layer_size + 1)) t2 = thetas[t1_end:].reshape((num_labels, hidden_layer_size + 1)) return t1, t2 def _forward(self, X, t1, t2): m = X.shape[0] ones = None if len(X.shape) == 1: ones = np.array(1).reshape(1,) else: ones = np.ones(m).reshape(m,1) # Input layer a1 = np.hstack((ones, X)) # Hidden Layer z2 = np.dot(t1, a1.T) a2 = self.activation_func(z2) a2 = np.hstack((ones, a2.T)) # Output layer z3 = np.dot(t2, a2.T) a3 = self.activation_func(z3) return a1, z2, a2, z3, a3 def function(self, thetas, input_layer_size, hidden_layer_size, num_labels, X, y, reg_lambda): t1, t2 = self.unpack_thetas(thetas, input_layer_size, hidden_layer_size, num_labels) m = X.shape[0] Y = np.eye(num_labels)[y] _, _, _, _, h = self._forward(X, t1, t2) costPositive = -Y * np.log(h).T costNegative = (1 - Y) * np.log(1 - h).T cost = costPositive - costNegative J = np.sum(cost) / m if reg_lambda != 0: t1f = t1[:, 1:] t2f = t2[:, 1:] reg = (self.reg_lambda / (2 * m)) * (self.sumsqr(t1f) + self.sumsqr(t2f)) J = J + reg return J def function_prime(self, thetas, input_layer_size, hidden_layer_size, num_labels, X, y, reg_lambda): t1, t2 = self.unpack_thetas(thetas, input_layer_size, hidden_layer_size, num_labels) m = X.shape[0] t1f = t1[:, 1:] t2f = t2[:, 1:] Y = np.eye(num_labels)[y] Delta1, Delta2 = 0, 0 for i, row in enumerate(X): a1, z2, a2, z3, a3 = self._forward(row, t1, t2) # Backprop d3 = a3 - Y[i, :].T d2 = np.dot(t2f.T, d3) * self.activation_func_prime(z2) Delta2 += np.dot(d3[np.newaxis].T, a2[np.newaxis]) Delta1 += np.dot(d2[np.newaxis].T, a1[np.newaxis]) Theta1_grad = (1 / m) * Delta1 Theta2_grad = (1 / m) * Delta2 if reg_lambda != 0: Theta1_grad[:, 1:] = Theta1_grad[:, 1:] + (reg_lambda / m) * t1f Theta2_grad[:, 1:] = Theta2_grad[:, 1:] + (reg_lambda / m) * t2f return self.pack_thetas(Theta1_grad, Theta2_grad) def fit(self, X, y): num_features = X.shape[0] input_layer_size = X.shape[1] num_labels = len(set(y)) theta1_0 = self.rand_init(input_layer_size, self.hidden_layer_size) theta2_0 = self.rand_init(self.hidden_layer_size, num_labels) thetas0 = self.pack_thetas(theta1_0, theta2_0) options = {'maxiter': self.maxiter} _res = optimize.minimize(self.function, thetas0, jac=self.function_prime, method=self.method, args=(input_layer_size, self.hidden_layer_size, num_labels, X, y, 0), options=options) self.t1, self.t2 = self.unpack_thetas(_res.x, input_layer_size, self.hidden_layer_size, num_labels) np.savetxt("weights_t1.txt", self.t1, newline="\n") np.savetxt("weights_t2.txt", self.t2, newline="\n") def predict(self, X): return self.predict_proba(X).argmax(0) def predict_proba(self, X): _, _, _, _, h = self._forward(X, self.t1, self.t2) return h ################## # IR data # ################## values = np.loadtxt('infrared_data.txt', delimiter=', ', usecols=[0,1,2,3,4]) targets = np.loadtxt('infrared_data.txt', delimiter=', ', dtype=(int), usecols=[5]) X_train, X_test, y_train, y_test = cross_validation.train_test_split(values, targets, test_size=0.4) nn = NN_1HL() nn.fit(values, targets) print("Accuracy of classification: "+str(accuracy_score(y_test, nn.predict(X_test))))

Answer:

The most obvious problem is that **your training dataset is very small**.

Since you're using `scipy.optimize.minimize`

instead of the usual iterative gradient descent, I think it's also likely you're **overfitting** your model to your training data. Possibly a iterative algorithm works better, here. Don't forget to carefully **monitor the validation error**.

If you try backpropagation with gradient descent, notice that, depending on the parameters used on backpropagation, neural networks take a while to converge

You can try to feed the network the same training data multiple times or tweak the learning rate but ideally you should use more diverse data.

Question:

I am training a Generative adversarial network. I referred many examples and they have scipy.misc in common. I came across this website but it didn't help.

What is use of this particular library.

Answer:

From the Website you stated:

Miscellaneous routines (scipy.misc)

Various utilities that don’t have another home.

So the use is to have a place to put routines, which do not necessarily fit somewhere else. Imagine it as a drawer where one puts all the things that one does not know where to put them, but one does not want to have them laying around in order to find them easier.

For more detail you might want to take a closer look into the routines (into your drawer), in order to explore which functionalities are covered by `scipy.misc`