wrong model type for regression error in 10 fold cross validation for Naive Bayes using R

repeated k-fold cross validation in r
caret cross validation
caret error wrong model type for classification
error wrong model type for classification in r
twoclasssummary caret
caret train error wrong model type for classification
caret random forest
caret svm

I am implementing 10 fold cross validation for Naive Bayes on some test data with 2 classes(0 and 1). I followed below steps and getting error.

data(testdata)

attach(testdata)

X <- subset(testdata, select=-Class)

Y <- Class

library(e1071)

naive_bayes <- naiveBayes(X,Y)

library(caret)
library(klaR)

nb_cv <- train(X, Y, method = "nb", trControl = trainControl(method = "cv", number = 10))

## Error:
## Error in train.default(X, Y, method = "nb", trControl = trainControl(number = 10)) : 
## wrong model type for regression


dput(testdata)

structure(list(Feature.1 = 6.534088, Feature.2 = -19.050915, 
Feature.3 = 7.599378, Feature.4 = 5.093594, Feature.5 = -22.15166, 
Feature.6 = -7.478444, Feature.7 = -59.534652, Feature.8 = -1.587918, 
Feature.9 = -5.76889, Feature.10 = 95.810563, Feature.11 = 49.124086, 
Feature.12 = -21.101489, Feature.13 = -9.187984, Feature.14 = -10.53006, 
Feature.15 = -3.782506, Feature.16 = -10.805074, Feature.17 = 34.039509, 
Feature.18 = 5.64245, Feature.19 = 19.389724, Feature.20 = 16.450196, 
Class = 1L), .Names = c("Feature.1", "Feature.2", "Feature.3", 
"Feature.4", "Feature.5", "Feature.6", "Feature.7", "Feature.8", 
"Feature.9", "Feature.10", "Feature.11", "Feature.12", "Feature.13", 
"Feature.14", "Feature.15", "Feature.16", "Feature.17", "Feature.18", 
"Feature.19", "Feature.20", "Class"), class = "data.frame", row.names = c(NA, 
-1L))

Also, how to calculare R square or AUC for this model

Dataset: There are 10000 records with 20 features and Binary class.

NaiveBayes is a classifier and hence converting Y to a factor or boolean is the right way to tackle the problem. Your original formulation was using a classifier tool but using numeric values and hence R was confused.

As far as R-square is concerned, again that metric is only computed for Regression problems not classification problems. To evaluate classification problems there are other metrics like Precision and Recall.

Please refer to the wikipedia link for more information on these metrics: http://en.wikipedia.org/wiki/Binary_classification

K-fold repeated cross validation for classification accuracy in Caret , I do not understand what I am doing wrong. How can I run this code to produce a naive Bayes matrix. Any advice would be deeply appreciated. I have tried  R2, RMSE and MAE are used to measure the regression model performance during cross-validation. In the following section, we’ll explain the basics of cross-validation, and we’ll provide practical example using mainly the caret R package.

It is working after changing label vector Y <- as.factor(Y)

How To Estimate Model Accuracy in R Using The Caret Package, Generally, I would recommend Repeated k-fold Cross Validation, but The following example uses 10-fold cross validation with 3 repeats to estimate Naive Bayes on the iris dataset. I find errors as the regression method id not appropriate. eqv. with false alarm, Type I error have i wrong information? However, you must be careful while using this type of validation technique. Once the distribution of the test set changes, the validation set might no longer be a good subset to evaluate your model on. 6. Cross Validation for time series. Splitting a time-series dataset randomly does not work because the time section of your data will be messed up.

Add to your structure

colClasses=c("Class"="character")

How to Evaluate Machine Learning Algorithms with R, Use Trial And Error To Choose An Algorithm In this case study we will use 10-​fold cross validation with 3 repeats. How To Estimate Model Accuracy in R Using the Caret Package Non-Linear methods: Neural Network, SVM, kNN and Naive Bayes You can see a good mixture of algorithm types. Reason being, the deviance for my R model is 1900, implying its a bad fit, but the python one gives me 85% 10 fold cross validation accuracy.. which means its good. Seems a bit strange So i wanted to run cross val in R to see if its the same result.

An Introduction to Machine Learning with R, For regression, we will use the root mean squared error (RMSE), which is what Train a linear model using 10-fold cross-validation and then use it to predict the family = "binomial") p <- predict(model, test, type = "response") summary(p) Apply a naive Bayes model, making sure you reuse the same train control object. You can use cross-validation to estimate the model hyper-parameters (regularization parameter for example). Usually that is done with 10-fold cross validation, because it is good choice for the bias-variance trade-off (2-fold could cause models with high bias, leave one out cv can cause models with high variance/over-fitting).

Chapter 29 Cross validation, We will describe how to implement cross validation in practice with the caret for the first time, an actual machine learning algorithm: k-nearest neighbors (kNN). In Section 27.8 we used linear regression to generate an estimate. Now we can fit the model in the training set, then compute the apparent error on the  NaiveBayes is a classifier and hence converting Y to a factor or boolean is the right way to tackle the problem. Your original formulation was using a classifier tool but using numeric values and hence R was confused. As far as R-square is concerned, again that metric is only computed for Regression problems not classification problems.

machine learning, i implementing 10 fold cross validation naive bayes on test data 2 classes(0 , 1). traincontrol(number = 10)) : ## wrong model type regression dput(testdata) using classifier tool using numeric values , hence r confused. Repeated k-fold Cross Validation. The process of splitting the data into k-folds can be repeated a number of times, this is called Repeated k-fold Cross Validation. The final model accuracy is taken as the mean from the number of repeats. The following example uses 10-fold cross validation with 3 repeats to estimate Naive Bayes on the iris dataset.

Comments
  • Please dput(testdata) if you want to get help
  • Thanks David. Added dput(testdata) with 1 record.
  • It is working after changing class labels from (1, 0) to (yes, no)
  • It also works after changing label vector Y <- as.factor(Y)