## Hot questions for Using Neural networks in kaggle

Question:

##### Problem summary and question

I'm trying to look at some of the data inside an object that can be enumerated over but not indexed. I'm still newish to python, but I don't understand how this is possible.

If you can enumerate it, why can't you access the index through the same way enumerate does? And if not, is there a way to access the items individually?

##### The actual example

import tensorflow_datasets as tfds train_validation_split = tfds.Split.TRAIN.subsplit([6, 4]) (train_data, validation_data), test_data = tfds.load( name="imdb_reviews", split=(train_validation_split, tfds.Split.TEST), as_supervised=True)

Take a select subset of the dataset

foo = train_data.take(5)

I **can** iterate over `foo`

with enumerate:

[In] for i, x in enumerate(foo): print(i)

which generates the expected output:

0 1 2 3 4

But then, when I try to index into it `foo[0]`

I get this error:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-44-2acbea6d9862> in <module> ----> 1 foo[0] TypeError: 'TakeDataset' object does not support indexing

Answer:

Python only allows these things if the class has methods for them:

`__getitem__`

is required for the`[]`

syntax.`__iter__`

and`__next__`

1 are required to iterate.

Any class can define one without defining the other. `__getattr__`

is usually not defined if it would be inefficient.

1 `__next__`

is required on the class returned by `__iter__`

.

Question:

I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package.

This is the train data from the website:

train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) head(m)

Here I train the neural network, depending on who survived. I want to see if I can predict who survived:

library(neuralnet) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, hidden=10, threshold=0.01,rep=100)

The net is trained. I load the test data and prepare it for test.

test=read.csv("test.csv") m2 <- model.matrix( ~ Pclass + Sex + Age + SibSp, data = test )

The final test for prediction:

res= compute(r, m2)

First, I do not know many hidden neurons I should take. Sometimes it takes to long, and when I succeed I cannot make the test with the test data because an error occurs which says the two data set are not compatible:

res= compute(r, m2) Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments

What am I doing wrong here?

The whole code:

train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) head(m) library(neuralnet) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, hidden=10, threshold=0.01,rep=100) test=read.csv("test.csv") m2 <- model.matrix( ~ Pclass + Sex + Age + SibSp, data = test ) res= compute(r, m2)

Answer:

Try using this to predict instead:

res = compute(r, m2[,c("Pclass", "Sexmale", "Age", "SibSp")])

That worked for me and you should get some output.

What appears to have happend: `model.matrix`

creates additional columns (`(Intercept)`

) which isn't part of the data which was used to build the neural net, as such in the `compute`

function it doesn't know what do with it. So the solution is to select explicitly the columns needed to use in the compute function. This is because `neuralnet`

tries to do some kind of matrix multiplication, but the matrix is of the wrong size.

For how many neurons, or optimizing hyper-parameters, you could use Cross-validation and all those other methods. If using a different package (`nnet`

) is fine then you can use the `caret`

package to determine the optimal parameters for you. It would look like this:

library(caret) nnet.model <- train(Survived ~ Pclass + Sex + Age + SibSp, data=train, method="nnet") plot(nnet.model) res2 = predict(nnet.model, newdata=test)

with the plot of the hyperparameters being this:

You can measure performance using the `confusionMatrix`

in the `caret`

package:

library(neuralnet) library(caret) library(dplyr) train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, rep=20) res = neuralnet::compute(r, m[,c("Pclass", "Sexmale", "Age", "SibSp")]) pred_train = round(res$net.result) # filter only with the ones with a survival prediction, not all records # were predicted for some reason; pred_rowid <- as.numeric(row.names(pred_train)) train_survived <- train %>% filter(row_number(Survived) %in% pred_rowid) %>% select(Survived) confusionMatrix(as.factor(train_survived$Survived), as.factor(pred_train))

Output:

Confusion Matrix and Statistics Reference Prediction 0 1 0 308 128 1 164 114 Accuracy : 0.5910364 95% CI : (0.5539594, 0.6273581) No Information Rate : 0.6610644 P-Value [Acc > NIR] : 0.99995895 Kappa : 0.119293 Mcnemar's Test P-Value : 0.04053844 Sensitivity : 0.6525424 Specificity : 0.4710744 Pos Pred Value : 0.7064220 Neg Pred Value : 0.4100719 Prevalence : 0.6610644 Detection Rate : 0.4313725 Detection Prevalence : 0.6106443 Balanced Accuracy : 0.5618084 'Positive' Class : 0