Hot questions for Using Neural networks in kaggle

Question:

Problem summary and question

I'm trying to look at some of the data inside an object that can be enumerated over but not indexed. I'm still newish to python, but I don't understand how this is possible.

If you can enumerate it, why can't you access the index through the same way enumerate does? And if not, is there a way to access the items individually?

The actual example
import tensorflow_datasets as tfds

train_validation_split = tfds.Split.TRAIN.subsplit([6, 4])

(train_data, validation_data), test_data = tfds.load(
    name="imdb_reviews", 
    split=(train_validation_split, tfds.Split.TEST),
    as_supervised=True)

Take a select subset of the dataset

foo = train_data.take(5)

I can iterate over foo with enumerate:

[In] for i, x in enumerate(foo):
    print(i)

which generates the expected output:

0
1
2
3
4

But then, when I try to index into it foo[0] I get this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-44-2acbea6d9862> in <module>
----> 1 foo[0]

TypeError: 'TakeDataset' object does not support indexing

Answer:

Python only allows these things if the class has methods for them:

Any class can define one without defining the other. __getattr__ is usually not defined if it would be inefficient.


1 __next__ is required on the class returned by __iter__.

Question:

I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package.

This is the train data from the website:

train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )
head(m)

Here I train the neural network, depending on who survived. I want to see if I can predict who survived:

library(neuralnet)

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
data=m, hidden=10, threshold=0.01,rep=100)

The net is trained. I load the test data and prepare it for test.

test=read.csv("test.csv")

m2 <- model.matrix(  ~  Pclass + Sex + Age + SibSp, data = test )

The final test for prediction:

res= compute(r, m2)

First, I do not know many hidden neurons I should take. Sometimes it takes to long, and when I succeed I cannot make the test with the test data because an error occurs which says the two data set are not compatible:

res= compute(r, m2)

Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments

What am I doing wrong here?

The whole code:

train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )
head(m)

library(neuralnet)

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
data=m, hidden=10, threshold=0.01,rep=100)

test=read.csv("test.csv")

m2 <- model.matrix(  ~  Pclass + Sex + Age + SibSp, data = test )

res= compute(r, m2)

Answer:

Try using this to predict instead:

res = compute(r, m2[,c("Pclass", "Sexmale", "Age", "SibSp")])

That worked for me and you should get some output.

What appears to have happend: model.matrix creates additional columns ((Intercept)) which isn't part of the data which was used to build the neural net, as such in the compute function it doesn't know what do with it. So the solution is to select explicitly the columns needed to use in the compute function. This is because neuralnet tries to do some kind of matrix multiplication, but the matrix is of the wrong size.


For how many neurons, or optimizing hyper-parameters, you could use Cross-validation and all those other methods. If using a different package (nnet) is fine then you can use the caret package to determine the optimal parameters for you. It would look like this:

library(caret)
nnet.model <- train(Survived ~ Pclass + Sex + Age + SibSp, 
                    data=train, method="nnet")
plot(nnet.model)
res2 = predict(nnet.model, newdata=test)

with the plot of the hyperparameters being this:


You can measure performance using the confusionMatrix in the caret package:

library(neuralnet)
library(caret)
library(dplyr)
train <- read.csv("train.csv")
m <- model.matrix(  ~ Survived + Pclass + Sex + Age + SibSp, data =train )

r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, 
                data=m, rep=20)

res = neuralnet::compute(r, m[,c("Pclass", "Sexmale", "Age", "SibSp")])
pred_train = round(res$net.result)

# filter only with the ones with a survival prediction, not all records
# were predicted for some reason;
pred_rowid <- as.numeric(row.names(pred_train))
train_survived <- train %>% filter(row_number(Survived) %in% pred_rowid) %>% select(Survived)
confusionMatrix(as.factor(train_survived$Survived), as.factor(pred_train))

Output:

Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 308 128
         1 164 114

               Accuracy : 0.5910364             
                 95% CI : (0.5539594, 0.6273581)
    No Information Rate : 0.6610644             
    P-Value [Acc > NIR] : 0.99995895            

                  Kappa : 0.119293              
 Mcnemar's Test P-Value : 0.04053844            

            Sensitivity : 0.6525424             
            Specificity : 0.4710744             
         Pos Pred Value : 0.7064220             
         Neg Pred Value : 0.4100719             
             Prevalence : 0.6610644             
         Detection Rate : 0.4313725             
   Detection Prevalence : 0.6106443             
      Balanced Accuracy : 0.5618084             

       'Positive' Class : 0