Scikit-learn confusion matrix

confusion matrix for multiple classes python
normalized confusion matrix
confusion matrix python? - stack overflow
multi-class confusion matrix
confusion matrix python without library
confusion matrix example
sklearn confusion matrix binary classification
confusion matrix diagram

I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix (so the inverse of how I set it up). This is confusing to me. Is the top row, in scikit-learn's default setting, the positive or negative class? Lets assume the confusion matrix output:

confusion_matrix(y_test, preds)
 [ [30  5]
    [2 42] ]

How would it look like in a confusion matrix? Are the actual instances the rows or the columns in scikit-learn?

          prediction                        prediction
           0       1                          1       0
         -----   -----                      -----   -----
      0 | TN   |  FP        (OR)         1 |  TP  |  FP
actual   -----   -----             actual   -----   -----
      1 | FN   |  TP                     0 |  FN  |  TN

Confusion matrix, Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The diagonal elements represent the number of points  Confusion matrix ¶ Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier.

Following the example of wikipedia. If a classification system has been trained to distinguish between cats and non cats, a confusion matrix will summarize the results of testing the algorithm for further inspection. Assuming a sample of 27 animals — 8 cats, and 19 non cats, the resulting confusion matrix could look like the table below:

With sklearn

If you want to maintain the structure of the wikipedia confusion matrix, first go the predicted values and then the actual class.

from sklearn.metrics import confusion_matrix
y_true = [0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,1,0,0,0,0]
y_pred = [0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0]
confusion_matrix(y_pred, y_true, labels=[1,0])

Out[1]: 
array([[ 5,  2],
       [ 3, 17]], dtype=int64)

Another way with crosstab pandas

true = pd.Categorical(list(np.where(np.array(y_true) == 1, 'cat','non-cat')), categories = ['cat','non-cat'])
pred = pd.Categorical(list(np.where(np.array(y_pred) == 1, 'cat','non-cat')), categories = ['cat','non-cat'])

pd.crosstab(pred, true, 
            rownames=['pred'], 
            colnames=['Actual'], margins=False, margins_name="Total")

Out[2]: 
Actual   cat  non-cat
pred                 
cat        5        2
non-cat    3       17

I hope it serves you

sklearn.metrics.confusion_matrix, Plot Confusion Matrix. Read more in the User Guide. Parameters. estimator​estimator instance. Fitted classifier or a fitted  Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. display_labels array-like of shape (n_classes,), default=None. Target names used for plotting.

Short answer In binary classification, when using the argument labels ,

confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0], labels=[0,1]).ravel()

the class labels, 0, and 1, are considered to be Negative and Positive, respectively. This is due to the order implied by the list, and not the alpha-numerical order.


Verification: Consider imbalanced class labels like this: (using imbalance class to make the distinction easier)

>>> y_true = [0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0]
>>> y_pred = [0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0]
>>> table = confusion_matrix(y_true, y_pred, labels=[0,1]).ravel()

this would give you a confusion table as follows:

>>> table
array([12,  1,  2,  1])

which corresponds to:

              Actual
        |   1   |   0  |
     ___________________
pred  1 |  TP=1 | FP=1 |
      0 |  FN=2 | TN=12|

where FN=2 means that there were 2 cases where the model predicted the sample to be negative (i.e., 0) but the actual label was positive (i.e., 1), hence False Negative equals 2.

Similarly for TN=12, in 12 cases the model correctly predicted the negative class (0), hence True Negative equals 12.

This way everything adds up assuming that sklearn considers the first label (in labels=[0,1] as the negative class. Therefore, here, 0, the first label, represents the negative class.

sklearn.metrics.plot_confusion_matrix, A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It can be used to evaluate the performance of a  By definition a confusion matrix \(C\)is such that \(C_{i, j}\)is equal to the number of observations known to be in group \(i\)but predicted to be in group \(j\). Thus in binary classification, the count of true negatives is \(C_{0,0}\), false negatives is \(C_{1,0}\), true positives is \(C_{1,1}\)and false positives is \(C_{0,1}\).

Supporting Answer:

When drawing the confusion matrix values using sklearn.metrics, be aware that the order of the values are

[ True Negative False positive] [ False Negative True Positive ]

If you interpret the values wrong, say TP for TN, your accuracies and AUC_ROC will more or less match, but your precision, recall, sensitivity, and f1-score will take a hit and you will end up with completely different metrics. This will result in you making a false judgement of your model's performance.

Do make sure to clearly identify what the 1 and 0 in your model represent. This heavily dictates the results of the confusion matrix.

Experience:

I was working on predicting fraud (binary supervised classification), where fraud was denoted by 1 and non-fraud by 0. My model was trained on a scaled up, perfectly balanced data set, hence during in-time testing, values of confusion matrix did not seem suspicious when my results were of the order [TP FP] [FN TN]

Later, when I had to perform an out-of-time test on a new imbalanced test set, I realized that the above order of confusion matrix was wrong and different from the one mentioned on sklearn's documentation page which refers to the order as tn,fp,fn,tp. Plugging in the new order made me realize the blunder and what a difference it had caused in my judgement of the model's performance.

How to create a confusion matrix in Python using scikit-learn, scikit learn sorts labels in ascending order, thus 0's are first column/row and 1's are the second one >>> from sklearn.metrics import confusion_matrix as cm  If a classification system has been trained to distinguish between cats and non cats, a confusion matrix will summarize the results of testing the algorithm for further inspection. Assuming a sample of 27 animals — 8 cats, and 19 non cats, the resulting confusion matrix could look like the table below: With sklearn

Scikit-learn confusion matrix, In the field of machine learning and specifically the problem of statistical A confusion matrix is a summary of prediction results on a classification problem. Learning Model Building in Scikit-learn : A Python Machine Learning Library  Confusion Matrix¶. The ConfusionMatrix visualizer is a ScoreVisualizer that takes a fitted scikit-learn classifier and a set of test X and y values and returns a report showing how each of the test values predicted classes compare to their actual classes.

Confusion Matrix in Machine Learning, Confusion Matrix¶. The ConfusionMatrix visualizer is a ScoreVisualizer that takes a fitted scikit-learn classifier and a set of test X and y values and returns a  In multilabel confusion matrix M C M, the count of true negatives is M C M:, 0, 0, false negatives is M C M:, 1, 0, true positives is M C M:, 1, 1 and false positives is M C M:, 0, 1. Multiclass data will be treated as if binarized under a one-vs-rest transformation.

Scikit Learn : Confusion Matrix, Accuracy, Precision and Recall , sklearn.metrics.confusion_matrix(y_true, y_pred, labels=None)¶. Compute confusion matrix to evaluate the accuracy of a classification. By definition a confusion  Example Confusion Matrix in Python with scikit-learn The scikit-learn library for machine learning in Python can calculate a confusion matrix. Given an array or list of expected values and a list of predictions from your machine learning model, the confusion_matrix() function will calculate a confusion matrix and return the result as an array.