Hot questions for Using Neural networks in confusion matrix


I've got some troubles during the model evaluation using Tensorflow with the Experimenter API.

I used to work using 2-classes NN, but this time I manage to train a 4-classes one and I need to figure out how to build a confusion matrix in this case. I tried using the tf.confusion_matrix function, but it doesn't work at all.

This is the fragment of code that I used:

if mode == ModeKeys.EVAL:

    eval_metric_ops = {
        'accuracy' : metrics.streaming_accuracy(predictions=predicted_classes, labels=labels),

        # Other metrics...

        'confusion_matrix': tf.confusion_matrix(prediction=predicted_classes, label=labels, num_classes=4)

    return tf.estimator.EstimatorSpec(

And this is the error that I got:

TypeError: Values of eval_metric_ops must be (metric_value, update_op) tuples, given: (<tf.Operation 'test/group_deps' type=NoOp>, <tf.Tensor 'test/accuracy/value:0' shape=() dtype=float32>, <tf.Variable 'test/confusion:0' shape=(4, 4) dtype=int32_ref>) for key: confusion_matrix

I read other answers about to create a confusion matrix in Tensorflow and I understood how to do it, but I think that my question is more related to the Estimator/Experimenter API.


your codes doesn't work because the framework expects eval_metric_ops to be a dictionary containing key with the name of the operation and values of type tuple (result tensor, update_operation for this tensor)

tf.confusion_matrix(prediction=predicted_classes, label=labels, num_classes=4) only returns the expected tensor.

You have to implement your own metric operation like this:

def eval_confusion_matrix(labels, predictions):
    with tf.variable_scope("eval_confusion_matrix"):
        con_matrix = tf.confusion_matrix(labels=labels, predictions=predictions, num_classes=4)

        con_matrix_sum = tf.Variable(tf.zeros(shape=(4,4), dtype=tf.int32),

        update_op = tf.assign_add(con_matrix_sum, con_matrix)

        return tf.convert_to_tensor(con_matrix_sum), update_op

# Add evaluation metrics (for EVAL mode)
eval_metric_ops = {
    "accuracy": tf.metrics.accuracy(labels, predicted_classes),
    "conv_matrix": eval_confusion_matrix(
        labels, predicted_classes)


2 questions, 1- I used neural network matlab toolbox to train a neural for classification, but each time I close the program and train and test the NN, I got different results!! do you know what happend? 2- which value in the confusion matrix would be my final accuracy of my network?

thank you in advance.


  1. When you use Matlab's neural network toolbox you have the option of choosing the percentage of your Training, Validation and Testing data (the default is 70% for training and 15-15% for validation and testing). The toolbox divides your data randomly, this is why you get different results. You can set a fix training, validation and testing data set by modifying the generated simple script.

  2. You need the Test Confusion Matrix

I hope it helped!


I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals? The dataset can be found in the link:

load iris.mat;

X = [f(1:100,:) l(1:100)];

numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);

split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test

istrainval = training(split1); % index for fitting
istest = test(split1);      % indices for quality assessment

trainData = X(istrainval,:);

Xtest = X(istest,:);
Ytest = Y(istest);

numExamplesXtrainval = size(trainData,1);

indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);

hiddenLayerSize = 10;

% data format = rows = number of dim, column = number of examples
net  = patternnet(hiddenLayerSize);
net  = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';

[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData');  %outputs predicted labels

train_predict = net(trainData');

performanceTrain = perform(net,Ytrain',train_predict)
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);   
[cmMatrixTrain]=  confusionmat(lbl_train,Lbl_Yhat_Train )

accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set:    Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])

[confTest] =  confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )

%unknown test
test_predict = net(Xtest');

performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
Lbl_Yhat_Test = grp2idx(Yhat_test);

[cmMatrix_Test]=  confusionmat(test_lbl,Lbl_Yhat_Test )

This is the output.

Problem1: There seems to be no prediction for the other class. Why?

Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?

performanceTrain =


cmMatrixTrain =

    45     0
    45     0

Training Set:    Total Train Acccuracy by MLP = 50%

confTest =

     9     0
     5     0

cmMatrix_Test =

     5     0
     5     0


There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:

plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')

cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set

net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;

So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.


I am currently trying to build a neural network to predict what rank people within the data will place.

The Rank system is: A,B,C,D,E

Everything runs very smoothly until I get to my confusion matrix. I get the error "Error: data and reference should be factors with the same levels.". I have tried many different methods on other posts but none seem to work.

The levels are both the same in NNPredicitions and test$Rank. I checked them both with table().


Indirect <-read_excel("C:/Users/Abdulazizs/Desktop/Projects/Indirect/FIltered Indirect.xlsx", 
    n_max = 500)

Indirect$Direct_or_Indirect <- NULL

Indirect$parentaccount <- NULL


counts <- table(Indirect$Rank)



part2 <- createDataPartition(Indirect$Rank, times = 1, p = .8, list = FALSE, groups = min(5, length(Indirect$Rank)))

train <- Indirect[part2, ]
test <- Indirect[-part2, ]


TrainingParameters <- trainControl(method = "repeatedcv", number = 10, repeats=10)

NNModel <- train(train[,-7], train$Rank,
                  method = "nnet",
                  trControl= TrainingParameters,
                  na.action = na.omit

NNPredictions <-predict(NNModel, test, type = "raw")


confusionMatrix(NNPredictions, test$Rank)

length(NNPredictions) length(test$Rank)

length(NNPredictions) [1] 98 length(test$Rank) [1] 98

table(NNPredictions, test$Rank, useNA="ifany") NNPredictions A B C D E A 1 0 0 0 0 B 0 6 0 0 0 C 0 0 11 0 0 D 0 0 0 18 0 E 0 0 0 0 62


Also change method = "prob" to method = "raw"

Table1 <- table(NNPredictions, test$Rank, useNA = "ifany")

cnf1 <- confusionMatrix(Table1)

Answered provided by dclarson


Here is the code for viewing in github with example output also visible as far as I can see(works for me) It's jupyter notebook on github.

I'm making a neural network for binary classification with python and keras and scikit-learn

My neural network seemed to compile and train and validate rather nicely.

The problem is at the end of my code, where there is the confusion matrix printouts... those don't look reasonable results for the confusion matrix Probably the issue is somewhere near here

# Print total accuracy and confusion matrix
val_predicts = model.predict(df_norm)
y_pred = argmax(val_predicts, axis = 1)
cm = confusion_matrix(groundTruth, y_pred)

With the graph, and if you choose verbose=1 in the validation phase, you can see that the accuracy tends to about 80% with some overfitting visible from the graph.

But in the y-pred printouts, I have all the values as zeroes...

I'm not sure what causes this... how can the y-pred be all zeroes if the accuracy is about 80%.

I remembered to use sigmoid on the output layer also, but I have a nagging feeling that you still have to round those results that you get from sigmoid. (i.e. if your sigmoid results are above or equal to 0.5 => convert to 1.0)


y_pred = argmax(val_predicts, axis = 1) this causes your prediction to output all zeros since you have only one one element array and the maximum is obviously the item at zero-th index. Instead you should do something like the following

y_pred = [1 * (x[0]>=0.5) for x in val_predicts]