## Hot questions for Using Neural networks in dataframe

Question:

I have a training set that looks like

Name Day Area X Y Month Night ATTACK Monday LA -122.41 37.78 8 0 VEHICLE Saturday CHICAGO -1.67 3.15 2 0 MOUSE Monday TAIPEI -12.5 3.1 9 1

`Name`

is the outcome/dependent variable. I converted `Name`

, `Area`

and `Day`

into factors, but I wasn't sure if I was supposed to for `Month`

and `Night`

, which only take on integer values 1-12 and 0-1, respectively.

I then convert the data into matrix

ynn <- model.matrix(~Name , data = trainDF) mnn <- model.matrix(~ Day+Area +X + Y + Month + Night, data = trainDF)

I then setup tuning the parameters

nnTrControl=trainControl(method = "repeatedcv",number = 3,repeats=5,verboseIter = TRUE, returnData = FALSE, returnResamp = "all", classProbs = TRUE, summaryFunction = multiClassSummary,allowParallel = TRUE) nnGrid = expand.grid(.size=c(1,4,7),.decay=c(0,0.001,0.1)) model <- train(y=ynn, x=mnn, method='nnet',linout=TRUE, trace = FALSE, trControl = nnTrControl,metric="logLoss", tuneGrid=nnGrid)

However, I get the error `Error: nrow(x) == n is not TRUE`

for the `model<-train`

I also get a similar error if I use `xgboost`

instead of `nnet`

Anyone know whats causing this?

Answer:

`y`

should be a numeric or factor vector containing the outcome for each sample, not a matrix. Using

train(y = make.names(trainDF$Name), ...)

helps, where `make.names`

modifies values so that they could be valid variable names.

Question:

I'm trying to run a kNN classifier across my dataset using 10-fold CV. I have some experience with models in WEKA but struggling to transfer this over to Sklearn.

Below is my code

filename = 'train4.csv' names = ['attribute names are here'] df = pandas.read_csv(filename, names=names) num_folds = 10 kfold = KFold(n_splits=10, random_state=7) model = KNeighborsClassifier() results = cross_val_score(model, df.drop('mix1_instrument', axis=1), df['mix1_instrument'], cv=kfold) print(results.mean())

I am receiving this error

ValueError: could not convert string to float: ''

How can I convert this attribute? And this contains useful information for classifying my instances would a conversion impact this?

There are two attributes that are 'object' that I believe need converting named 'class1' and class2'

Sample data below...

{ 'temporalCentroid': { 0: 'temporalCentroid', 1: '1.67324', 2: '1.330722', 3: '0.786984', 4: '1.850129' }, 'LogSpecCentroid': { 0: 'LogSpecCentroid', 1: '-1.043802', 2: '-0.82943', 3: '-2.441297', 4: '-0.837145' }, 'LogSpecSpread': { 0: 'LogSpecSpread', 1: '0.747558', 2: '1.378373', 3: '0.667634', 4: '1.238404' }, 'MFCC1': { 0: 'MFCC1', 1: '3.502117', 2: '6.697601', 3: '4.011488', 4: '0.823614' }, 'MFCC2': { 0: 'MFCC2', 1: '-9.208897', 2: '-9.741549', 3: '15.27665', 4: '-15.22256' }, 'MFCC3': { 0: 'MFCC3', 1: '-2.334097', 2: '-9.868089', 3: '0.802509', 4: '-4.978688' }, 'MFCC4': { 0: 'MFCC4', 1: '-9.013086', 2: '0.609091', 3: '2.50685', 4: '-2.489553' }, 'MFCC5': { 0: 'MFCC5', 1: '4.847481', 2: '1.733307', 3: '0.10459', 4: '1.066615' }, 'MFCC6': { 0: 'MFCC6', 1: '-4.770421', 2: '-5.381835', 3: '-0.260118', 4: '-1.020861' }, 'MFCC7': { 0: 'MFCC7', 1: '-3.362488', 2: '-1.261088', 3: '0.593255', 4: '-2.007349' }, 'MFCC8': { 0: 'MFCC8', 1: '-9.527529', 2: '-3.809237', 3: '-0.362287', 4: '-8.938164' }, 'MFCC9': { 0: 'MFCC9', 1: '-9.629579', 2: '1.486923', 3: '-2.957592', 4: '-2.324424' }, 'MFCC10': { 0: 'MFCC10', 1: '1.848685', 2: '-3.938455', 3: '-1.884439', 4: '-2.535579' }, 'MFCC11': { 0: 'MFCC11', 1: '-2.311295', 2: '-2.159865', 3: '-0.827179', 4: '0.638553' }, 'MFCC12': { 0: 'MFCC12', 1: '-7.696675', 2: '-3.138412', 3: '-0.605056', 4: '-1.116259' }, 'MFCC13': { 0: 'MFCC13', 1: '10.35572', 2: '9.095669', 3: '6.426399', 4: '15.04535' }, 'MFCCMin': { 0: 'MFCCMin', 1: '-9.629579', 2: '-9.868089', 3: '-2.957592', 4: '-15.22256' }, 'MFCCMax': { 0: 'MFCCMax', 1: '10.35572', 2: '9.095669', 3: '15.27665', 4: '15.04535' }, 'MFCCSum': { 0: 'MFCCSum', 1: '-37.300064', 2: '-19.675939', 3: '22.82507', 4: '-23.059305' }, 'MFCCAvg': { 0: 'MFCCAvg', 1: '-2.869235692', 2: '-1.513533769', 3: '1.755774615', 4: '-1.773792692' }, 'MFCCStd': { 0: 'MFCCStd', 1: '6.409842944', 2: '5.558499123', 3: '4.756836281', 4: '6.76039911' }, 'Energy': { 0: 'Energy', 1: '-2.96148', 2: '-3.522993', 3: '-3.409359', 4: '-2.235853' }, 'ZeroCrossings': { 0: 'ZeroCrossings', 1: '128', 2: '188', 3: '43', 4: '288' }, 'SpecCentroid': { 0: 'SpecCentroid', 1: '284.0513', 2: '414.8489', 3: '102.2096', 4: '405.1262' }, 'SpecSpread': { 0: 'SpecSpread', 1: '207.5526', 2: '350.7937', 3: '53.52178', 4: '360.0353' }, 'Rolloff': { 0: 'Rolloff', 1: '263.7817', 2: '783.2703', 3: '129.1992', 4: '912.4695' }, 'Flux': { 0: 'Flux', 1: '0', 2: '0', 3: '0', 4: '0' }, 'bandsCoefMin': { 0: 'bandsCoefMin', 1: '-0.224957', 2: '-0.247903', 3: '-0.22283', 4: '-0.232534' }, 'bandsCoefMax': { 0: 'bandsCoefMax', 1: '-0.074945', 2: '-0.113654', 3: '-0.062254', 4: '-0.080883' }, 'bandsCoefSum1': { 0: 'bandsCoefSum1', 1: '-5.575428', 2: '-5.524777', 3: '-5.511125', 4: '-5.532536' }, 'bandsCoefAvg': { 0: 'bandsCoefAvg', 1: '-0.168952364', 2: '-0.167417485', 3: '-0.167003788', 4: '-0.167652606' }, 'bandsCoefStd': { 0: 'bandsCoefStd', 1: '0.042580181', 2: '0.048429973', 3: '0.049881374', 4: '0.0475839' }, 'bandsCoefSum': { 0: 'bandsCoefSum', 1: '382.5963', 2: '360.9232', 3: '384.3541', 4: '368.9903' }, 'prjmin': { 0: 'prjmin', 1: '-0.999362', 2: '-0.999719', 3: '-0.988315', 4: '-0.999421' }, 'prjmax': { 0: 'prjmax', 1: '0.023797', 2: '0.009596', 3: '0.028112', 4: '0.024612' }, 'prjSum': { 0: 'prjSum', 1: '-0.99911', 2: '-1.006792', 3: '-1.084054', 4: '-1.002478' }, 'prjAvg': { 0: 'prjAvg', 1: '-0.030276061', 2: '-0.030508848', 3: '-0.032850121', 4: '-0.030378121' }, 'prjStd': { 0: 'prjStd', 1: '0.174082468', 2: '0.174040569', 3: '0.173600498', 4: '0.174064118' }, 'LogAttackTime': { 0: 'LogAttackTime', 1: '0.365883', 2: '-0.35427', 3: '-0.669283', 4: '-0.026181' }, 'HamoPkMin': { 0: 'HamoPkMin', 1: '0', 2: '0', 3: '0', 4: '0' }, 'HamoPkMax': { 0: 'HamoPkMax', 1: '1.025473', 2: '1.05761', 3: '0.986766', 4: '0.957316' }, 'HamoPkSum': { 0: 'HamoPkSum', 1: '14.391206', 2: '20.306125', 3: '9.727358', 4: '14.772449' }, 'HamoPkAvg': { 0: 'HamoPkAvg', 1: '0.513971643', 2: '0.72521875', 3: '0.347405643', 4: '0.527587464' }, 'HamoPkStd': { 0: 'HamoPkStd', 1: '0.376622124', 2: '0.325929503', 3: '0.388971641', 4: '0.381693476' }, 'class1': { 0: 'class1', 1: 'aerophone', 2: 'aerophone', 3: 'chordophone', 4: 'aerophone' }, 'class2': { 0: 'class2', 1: 'aero_single-reed', 2: 'aero_lip-vibrated', 3: 'chrd_simple', 4: 'aero_single-reed' }, 'mix1_instrument': { 0: 'mix1_instrument', 1: 'Saxophone', 2: 'Trumpet', 3: 'Piano', 4: 'Clarinet' } }

Thanks

Answer:

Here is a small demo:

Source DF:

In [43]: df Out[43]: Energy HamoPkStd class1 class2 mix1_instrument 0 -2.961480 14.391206 aerophone aero_single-reed Saxophone 1 -3.522993 20.306125 chordophone aero_lip-vibrated Trumpet 2 -3.409359 9.727358 aerophone chrd_simple Piano

Labels encoding:

In [44]: %paste from sklearn.preprocessing import LabelBinarizer, LabelEncoder str_cols = df.columns[df.columns.str.contains('(?:class|instrument)')] clfs = {c:LabelEncoder() for c in str_cols} for col, clf in clfs.items(): df[col] = clfs[col].fit_transform(df[col]) ## -- End pasted text --

Result - all text/string columns have been converted to numbers, so we can feed it to Neural Networks:

In [45]: df Out[45]: Energy HamoPkStd class1 class2 mix1_instrument 0 -2.961480 14.391206 0 1 1 1 -3.522993 20.306125 1 0 2 2 -3.409359 9.727358 0 2 0

Inverse transfomration:

In [48]: clfs['class1'].inverse_transform(df['class1']) Out[48]: array(['aerophone', 'chordophone', 'aerophone'], dtype=object) In [49]: clfs['mix1_instrument'].inverse_transform(df['mix1_instrument']) Out[49]: array(['Saxophone', 'Trumpet', 'Piano'], dtype=object)

Question:

I want to train a neural network (Multi-Perceptron) with the following data:

1 2 3 Other Field Label [1, 2, 3, 4] [5, 6, 7, 8] [9, 10, 11] 1234 5678 etc...

Here `1`

, `2`

and `3`

are columns that contain a list. The other two columns just have numeric values.

Only I keep getting this:

ValueError: setting an array element with a sequence.

Is this even possible?

Edit: My code to train the neural network is as follows:

from sklearn.neural_network import MLPClassifier mlp = MLPClassifier(alpha=1e-5, hidden_layer_sizes=(10, 10), random_state=1) mlp.fit(X_train, y_train)

Here's a screenshot of my train data:

And my label is just one column with numbers.

Answer:

If your lists always have the same length, it's just an issue of splitting each list-column into four individual columns, like described e.g. here:

# create a dataset raw_data = {'score': [1,2,3], 'tags': [['apple','pear','guava'],['truck','car','plane'],['cat','dog','mouse']]} df = pd.DataFrame(raw_data, columns = ['score', 'tags']) # expand df.tags into its own dataframe tags = df['tags'].apply(pd.Series) # rename each variable tags = tags.rename(columns = lambda x : 'tag_' + str(x)) # join the tags dataframe back to the original dataframe df = pd.concat([df[:], tags[:]], axis=1) df.drop('tags', inplace=True, axis=1)

If not, the best answer might be problem-specific. One approach could be to extend each list to the length of the longest list by padding with filler values and then doing the same.

Question:

I'm trying to create the most basic neural network from scratch to predict stocks for apple. The following code is what i have gotten to so far with assistance from looking at data science tutorials. However, I'm at the bit of actually feeding in the data and making sure it does so correctly.I would to feed in a pandas data frame of a stock trade. This is my view of the NN.

- 5 Input nodes (Open,Close,High,Low,Volume) *note - this will be in a pandas data frame with a datetime index
- AF that sums the weights of each input.
- Sigmoid function to normalise the values
- 1 output (adj close) *Not sure what I should use as the actual value

Then the process is to move back using the back-propagation technique.

import pandas as pd import pandas_datareader as web import matplotlib.pyplot as plt import numpy as np def sigmoid(x): return 1.0/(1+ np.exp(-x)) def sigmoid_derivative(x): return x * (1.0 - x) class NeuralNetwork: def __init__(self, x, y): self.input = x self.weights1 = #will work out when i get the correct input self.weights2 = #will work out when i get the correct input self.y = y self.output = #will work out def feedforward(self): self.layer1 = sigmoid(np.dot(self.input, self.weights1)) self.output = sigmoid(np.dot(self.layer1, self.weights2)) def backprop(self): # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1 d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output))) d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1))) # update the weights with the derivative (slope) of the loss function self.weights1 += d_weights1 self.weights2 += d_weights2 if __name__ == "__main__": X = #need help here y = #need help here nn = NeuralNetwork(X,y) for i in range(1500): nn.feedforward() nn.backprop() print(nn.output)

If you have any suggestions, corrections or anything please let me know because I am thrououghly invested into learning the neural networks.

Thanks.

Answer:

Directly using Pandas in a neural network would be absolutely ridiculous. The performance would be abysmal. What you do instead is pass in the underlying numpy array.

X = df[['Open','Close','High','Low','Volume']].values y = df['adj close'].values

Does that answer the question?

Question:

I'm starting with Neural Networks and I'm having some issues with my data format. I have a `pandas`

`DataFrame`

with `130`

rows, `4`

columns and each data point is an array of `595`

items.

| Col 1 | Col 2 | Col 3 | Col 4 | Row 1 | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] | Row 2 | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] | Row 3 | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] | [x1, ..., x595] |

I created the *X_train*, *X_test*, *y_train* and *y_test* using *train_test_split*. However, when I check the shape of *X_train* it returns (52,4) and I'm not being able to create a model for my NN because it doesn't accept this shape. This is the error:

"ValueError: Error when checking input: expected dense_4_input to have 3 dimensions, but got array with shape (52, 4)"

I believe it's because it should be `(52,4,595)`

, right? So, I'm kind of confused, how can I specify this *input_format* correctly or maybe reshape my data for the appropriate data format?

I am using `pandas`

, `keras`

, `tensorflow`

and `jupyter notebook`

.

Answer:

You have to reshape your data to a 3d numpy array.

Suppose we have a data frame where each cell is a numpy array as you described

import pandas as pd import numpy as np data=pd.DataFrame(np.zeros((130,4))).astype('object') for i in range(130): for k in range(4): #print(i,k) data.iloc[i,k]=np.zeros(595)

we can then reshape our data frame to a 3d numpy array doing:

dataar=data.values dataar=np.stack((np.vstack(dataar[:,0]),np.vstack(dataar[:,1]),np.vstack(dataar[:,2]),np.vstack(dataar[:,3]))) dataar=dataar.reshape(130,4,595) dataar.shape # (130, 4, 595)

Question:

I have time-series dataframes that I want to use in conjunction with a convolutional neural network for pattern/anomaly detection.

Just wondering about how I can transform without losing essential data?

Answer:

Managed to form a tensor containing 3D arrays for analysis in Convolutional Neural Networks from a simple Data Frame using a moving window:

def windows(data, size): start = 0 while start < len(data): #print(start,start+size) yield start, start + size print(start, start + size) start += 1 def segmentor(data,window_size,num_channels): segments=np.empty((0,window_size,num_channels)) #create dimensions for height component for (start,end) in windows(data,window_size): placeholder=data.iloc[int(start):int(end),:] #slices the dataframe to extract that time window #Now need to forgo the leftovers in each dataframe: if(len(placeholder)==window_size): #If the length of timewindow == specified time-window size, pl_=(np.dstack((placeholder.ix[:,i] for i in placeholder))) #stack the columns (depthwise) #print(pl_.shape) #pl_=pl_.swapaxes(1,2) segments=np.vstack([segments,pl_]) #print(segments.shape) return segments

The resulting structures can then be passed to generic CNNs.

Question:

I think this is a simple question, but not for me( There is a table in df:

Date X1 X2 Y1 07.02.2019 5 1 1 08.02.2019 6 2 1 09.02.2019 1 3 0 10.02.2019 4 4 1 11.02.2019 1 1 0 12.02.2019 4 2 1 13.02.2019 5 5 1 14.02.2019 6 5 1 15.02.2019 1 1 0 16.02.2019 4 5 1 17.02.2019 1 2 0 18.02.2019 1 1 19.02.2019 2 1 20.02.2019 3 2 21.02.2019 4 14

I need to build a neural network for Y1 from the parameters X1 and X2 and then apply it to the lines with a date greater than 17.02.2019, And save the network prediction result in a separate df2

import pandas as pd import numpy as np import re from sklearn.neural_network import MLPClassifier df = pd.read_csv("ob.csv", encoding = 'cp1251', sep = ';') df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') startdate = pd.to_datetime('2019-02-17') X = ['X1', 'X2'] ???? y = ['Y1'] ???? clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1) clf.fit(x, y) clf.predict(???????) ????? df2 = ????

Where ???? - I do not know how to set the conditions correctly

Answer:

import pandas as pd import numpy as np import re from sklearn.neural_network import MLPClassifier df = pd.read_csv("ob.csv", encoding = 'cp1251', sep = ';') df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') startdate = pd.to_datetime('2019-02-17') train = df[df['Date'] <= '2019-02-17'] test = df[df['Date'] > '2019-02-17'] X_train = train[['X1', 'X2']] y_train = train[['Y1']] X_test = test[['X1', 'X2']] y_test = test[['Y1']] clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1) clf.fit(X_train, y_train) df2 = pd.DataFrame(clf.predict(X_test)) df2.to_csv('prediction.csv')