Precision/recall for multiclass-multilabel classification

multi-class classification accuracy measure
precision and recall for multi-label classification
multi-class precision-recall
how to calculate accuracy in multiclass classification
multi-class classification metrics
accuracy for multiclass classification python
precision, recall f1-score support
knn precision-recall

I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels?

How to compute precision/recall for multiclass-multilabel , Another popular tool for measuring classifier performance is ROC/AUC ; this one too has a multi-class / multi-label extension : see [Hand 2001]. [Hand 2001]: A  Recall: It calculates the proportion of actual positives that were identified correctly. To calculate precision and recall for multiclass-multilabel classification. You can add the precision and recall separately for each class, then divide the sum with the number of classes. You will get the approximate calculation of precision and recall for them.

The answer is that you have to compute precision and recall for each class, then average them together. E.g. if you classes A, B, and C, then your precision is:

(precision(A) + precision(B) + precision(C)) / 3

Same for recall.

I'm no expert, but this is what I have determined based on the following sources:

https://list.scms.waikato.ac.nz/pipermail/wekalist/2011-March/051575.html http://stats.stackexchange.com/questions/21551/how-to-compute-precision-recall-for-multiclass-multilabel-classification

Precision/recall for multiclass-multilabel classification, For multi-label classification you have two ways to go First consider the following. $n$ is the number of examples. $Y_i$ is the ground truth  In a similar way, we can calculate the precision and recall for the other two classes: Fish and Hen. For Fish the numbers are 66.7% and 20.0% respectively. For Hen the number for both precision and recall is 66.7%. Go ahead and verify these results. You can use the two images below to help you.

Precision-Recall, In order to extend the precision-recall curve and average precision to multi-class or multi-label classification, it is necessary to binarize the output. One curve can  Computing Precision and Recall for Multi-Class Classification Problems In evaluating multi-class classification problems, we often think that the only way to evaluate performance is by computing the accuracy which is the proportion or percentage of correctly predicted labels over all predictions.

In python using sklearn and numpy:

from sklearn.metrics import confusion_matrix
import numpy as np

labels = ...
predictions = ...

cm = confusion_matrix(labels, predictions)
recall = np.diag(cm) / np.sum(cm, axis = 1)
precision = np.diag(cm) / np.sum(cm, axis = 0)

Multi-Class Metrics Made Simple, Part I: Precision and Recall, Given a classifier, I find that the best way to think about classifier performance is by using the so-called “confusion matrix”. For binary classification, a confusion  Precision and recall can be calculated for multi-class classification by using the confusion matrix. A confusion matrix is a way of classifying true positives, true negatives, false positives, and false negatives, when there are more than 2 classes. It's used for computing the precision and recall and hence f1-score for multi class problems.

Simple averaging will do if the classes are balanced.

Otherwise, recall for each real class needs to be weighted by prevalence of the class, and precision for each predicted label needs to be weighted by the bias (probability) for each label. Either way you get Rand Accuracy.

A more direct way is to make a normalized contingency table (divide by N so table adds up to 1 for each combination of label and class) and add the diagonal to get Rand Accuracy.

But if classes aren't balanced, the bias remains and a chance corrected method such as kappa is more appropriate, or better still ROC analysis or a chance correct measure such as informedness (height above the chance line in ROC).

Deep dive into multi-label classification..! (With detailed Case Study), For example, multi-class classification makes the assumption that each We just take the average of the precision and recall of the system on  Hi! Keras: 2.0.4 I recently spent some time trying to build metrics for multi-class classification outputting a per class precision, recall and f1 score. I want to have a metric that's correctl

How to compute precision and recall for a multi-class classification , You can do that for other metrics like recall, and for each label. But there is no further multiclass generalization. Accuracy remains well defined in the same way. For each label the metrics (eg. precision, recall) are computed and then these label-wise metrics are aggregated. Hence, in this case you end up computing the precision/recall for each label over the entire dataset, as you do for a binary classification (as each label has a binary assignment), then aggregate it.

Calculating Precision, Recall and F1 score in case of multi label , I have the Tensor containing the ground truth labels that are one hot encoded. My predicted tensor has the probabilities for each class. In this  Precision-recall curves are typically used in binary classification to study the output of a classifier. In order to extend the precision-recall curve and average precision to multi-class or multi-label classification, it is necessary to binarize the output.

Calculate mean Average Precision (mAP) for multi-label classification, After a few epochs of training, we get the predicted scores shown below, which is far from correct. Let's start from here. We first introduce precision and recall,  Precision and Recall: A Tug of War. To fully evaluate the effectiveness of a model, you must examine both precision and recall. Unfortunately, precision and recall are often in tension. That is, improving precision typically reduces recall and vice versa.

Comments
  • +1 What's up with the downvotes without comments? I had the same question and I'm glad I found this page. @ThomasJungblut I understand how to calculate the precision for a given class, e.g. class A, but how should I calculate the precision for all classes? Is it an arithmetic mean of the precision for each class?
  • I found a similar question, this might be a duplicate: stackoverflow.com/questions/3856013/…
  • This question appears to be off-topic because it asks about the textbook formula and not programming it and so belongs on CrossValidated. In fact, it was already answered well a couple days before this question was asked: stats.stackexchange.com/questions/21551/…
  • Does the recall equal the precision when using the example based approach for non-multilabel, but multiclass classification?
  • If your data has unbalanced number of labels, this averaging may not reflect the real performance.