Hot questions for Using Neural networks in kaggle
Problem summary and question
I'm trying to look at some of the data inside an object that can be enumerated over but not indexed. I'm still newish to python, but I don't understand how this is possible.
If you can enumerate it, why can't you access the index through the same way enumerate does? And if not, is there a way to access the items individually?
The actual example
import tensorflow_datasets as tfds train_validation_split = tfds.Split.TRAIN.subsplit([6, 4]) (train_data, validation_data), test_data = tfds.load( name="imdb_reviews", split=(train_validation_split, tfds.Split.TEST), as_supervised=True)
Take a select subset of the dataset
foo = train_data.take(5)
I can iterate over
foo with enumerate:
[In] for i, x in enumerate(foo): print(i)
which generates the expected output:
0 1 2 3 4
But then, when I try to index into it
foo I get this error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-44-2acbea6d9862> in <module> ----> 1 foo TypeError: 'TakeDataset' object does not support indexing
Python only allows these things if the class has methods for them:
Any class can define one without defining the other.
__getattr__ is usually not defined if it would be inefficient.
__next__ is required on the class returned by
I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package.
This is the train data from the website:
train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) head(m)
Here I train the neural network, depending on who survived. I want to see if I can predict who survived:
library(neuralnet) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, hidden=10, threshold=0.01,rep=100)
The net is trained. I load the test data and prepare it for test.
test=read.csv("test.csv") m2 <- model.matrix( ~ Pclass + Sex + Age + SibSp, data = test )
The final test for prediction:
res= compute(r, m2)
First, I do not know many hidden neurons I should take. Sometimes it takes to long, and when I succeed I cannot make the test with the test data because an error occurs which says the two data set are not compatible:
res= compute(r, m2) Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments
What am I doing wrong here?
The whole code:
train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) head(m) library(neuralnet) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, hidden=10, threshold=0.01,rep=100) test=read.csv("test.csv") m2 <- model.matrix( ~ Pclass + Sex + Age + SibSp, data = test ) res= compute(r, m2)
Try using this to predict instead:
res = compute(r, m2[,c("Pclass", "Sexmale", "Age", "SibSp")])
That worked for me and you should get some output.
What appears to have happend:
model.matrix creates additional columns (
(Intercept)) which isn't part of the data which was used to build the neural net, as such in the
compute function it doesn't know what do with it. So the solution is to select explicitly the columns needed to use in the compute function. This is because
neuralnet tries to do some kind of matrix multiplication, but the matrix is of the wrong size.
For how many neurons, or optimizing hyper-parameters, you could use Cross-validation and all those other methods. If using a different package (
nnet) is fine then you can use the
caret package to determine the optimal parameters for you. It would look like this:
library(caret) nnet.model <- train(Survived ~ Pclass + Sex + Age + SibSp, data=train, method="nnet") plot(nnet.model) res2 = predict(nnet.model, newdata=test)
with the plot of the hyperparameters being this:
You can measure performance using the
confusionMatrix in the
library(neuralnet) library(caret) library(dplyr) train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass + Sex + Age + SibSp, data =train ) r <- neuralnet( Survived ~ Pclass + Sexmale + Age + SibSp, data=m, rep=20) res = neuralnet::compute(r, m[,c("Pclass", "Sexmale", "Age", "SibSp")]) pred_train = round(res$net.result) # filter only with the ones with a survival prediction, not all records # were predicted for some reason; pred_rowid <- as.numeric(row.names(pred_train)) train_survived <- train %>% filter(row_number(Survived) %in% pred_rowid) %>% select(Survived) confusionMatrix(as.factor(train_survived$Survived), as.factor(pred_train))
Confusion Matrix and Statistics Reference Prediction 0 1 0 308 128 1 164 114 Accuracy : 0.5910364 95% CI : (0.5539594, 0.6273581) No Information Rate : 0.6610644 P-Value [Acc > NIR] : 0.99995895 Kappa : 0.119293 Mcnemar's Test P-Value : 0.04053844 Sensitivity : 0.6525424 Specificity : 0.4710744 Pos Pred Value : 0.7064220 Neg Pred Value : 0.4100719 Prevalence : 0.6610644 Detection Rate : 0.4313725 Detection Prevalence : 0.6106443 Balanced Accuracy : 0.5618084 'Positive' Class : 0