Hot questions for Using Neural networks in weka


I'm trying to use Multi-Perceptron in Weka Knowledge Flow. In the attachment you can see the setting for the block. As written in the help: "hiddenLayers -- This defines the hidden layers of the neural network. This is a list of positive whole numbers. 1 for each hidden layer. Comma seperated. To have no hidden layers put a single 0 here. This will only be used if autobuild is set. There are also wildcard values 'a' = (attribs + classes) / 2, 'i' = attribs, 'o' = classes , 't' = attribs + classes."

However, it's still a little bit confusing for me. How can I build a neural network with 3 hidden layers, each has different number of units (say 2,5,6). And can you help explain wildcard values? I think it's only for number of hidden layers, not for number of hidden units in each layer.

Thank you.


The GUI option will help you work through this. If you supply '2,5,6' to the hidden layers, it will create 3 layers with 2, 5, and 6 units respectively. The wildcard values seem to be shortcuts for the numbers they represent (a = (# of attributes + # of classes) / 2 , etc). Here are a couple visual steps representing what I mean.

You can see the 'a, 2, 5, 6' in hiddenLayers.

a = # of attributes + # of classes / 2

There are 6 attributes and 1 class, so a = 7 / 2 = 3

This means that we expect to see 3 units in the first hidden layer, then 2, 5, and 6, followed by the 1 unit in the output layer.


I'm running some experiments on various classification datasets using WEKA's MultilayerPerceptron implementation. I was expecting to be able to observe overfitting as the number of train iterations (epochs) increased. However, despite letting the number of epochs grow fairly large (15k), I haven't seen it yet. How should I interpret this? Note that I'm not achieving 100% accuracy on the train or test sets so it's not that the problem is too simplisitic.

Some ideas I came up with are:

  • I simply haven't waited long enough
  • My network isn't complex enough to overfit
  • My data doesn't really contain any noise but isn't descriptive enough for the target function
  • I'm not using the Evaluation class in WEKA correctly
  • My test data set has leaked in to my train set (I'm 99% sure it hasn't, though)

I'm running the following after each epoch (I modified MultilayerPerceptron to have an "EpochListener", but no other changes than that:

    Evaluation eval = new Evaluation(train);
    eval.evaluateModel(ann, train);
    eval.evaluateModel(ann, test);

The train accuracy seems to plateau and I never see the test accuracy start to decrease substantially.


Can you describe your network and data a little bit? How many dimensions are your data? How many hidden layers, with how many nodes in your network?

My initial thought is that if you have a fairly simple data set, with a good amount of data, and a fairly simple network, your network just won't have enough alternative hypothesis to overfit.


Hi I have trained multilayer perceptron on iris data set in weka tool. It gives me following model as a result.

    === Run information ===

    Scheme:weka.classifiers.functions.MultilayerPerceptron -L 0.3 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H a -G -R
    Relation:     iris
    Instances:    150
    Attributes:   5
    Test mode:split 66.0% train, remainder test

    === Classifier model (full training set) ===

    Sigmoid Node 0
        Inputs    Weights
        Threshold    -3.5015971588434014
        Node 3    -1.0058110853859945
        Node 4    9.07503844669134
        Node 5    -4.107780453339234
    Sigmoid Node 1
        Inputs    Weights
        Threshold    1.0692845992273177
        Node 3    3.8988736877894024
        Node 4    -9.768910360340264
        Node 5    -8.599134493151348
    Sigmoid Node 2
        Inputs    Weights
        Threshold    -1.007176238343649
        Node 3    -4.2184061338270356
        Node 4    -3.626059686321118
        Node 5    8.805122981737854
    Sigmoid Node 3
        Inputs    Weights
        Threshold    3.382485556685675
        Attrib sepallength    0.9099827458022276
        Attrib sepalwidth    1.5675138827531276
        Attrib petallength    -5.037338107319895
        Attrib petalwidth    -4.915469682506087
    Sigmoid Node 4
        Inputs    Weights
        Threshold    -3.330573592291832
        Attrib sepallength    -1.1116750023770083
        Attrib sepalwidth    3.125009686667653
        Attrib petallength    -4.133137022912305
        Attrib petalwidth    -4.079589727871456
    Sigmoid Node 5
        Inputs    Weights
        Threshold    -7.496091023618089
        Attrib sepallength    -1.2158878822058787
        Attrib sepalwidth    -3.5332821317534897
        Attrib petallength    8.401834252274096
        Attrib petalwidth    9.460215580472827
    Class Iris-setosa
        Node 0
    Class Iris-versicolor
        Node 1
    Class Iris-virginica
        Node 2

    Time taken to build model: 34.13 seconds

I'm new to weka I dont understand how are nodes numbered in this? and why is there need of threshold when we are using sigmoid. Can there be multiple attributes in output?


There are 3 output nodes (0, 1, 2) and 3 hidden units (3, 4, 5). You can differentiate by looking at what are they connected to, for example

    Sigmoid Node 3
    Inputs    Weights
    Threshold    3.382485556685675
    Attrib sepallength    0.9099827458022276
    Attrib sepalwidth    1.5675138827531276
    Attrib petallength    -5.037338107319895
    Attrib petalwidth    -4.915469682506087

is clearly a hidden node as it is connected to input attributes. Thus node connected to this one is in next layer (0, 1, 2).

In general WEKA numebrs your nodes from the output layer to the input layer, thus you first get outptu nodes, then the ones connected to them, then previous layer, previous... and finally first hidden layer.

Why is there threshold? Because sigmoid is defined as

sigmoid(w,x,b) = 1/(1+exp(-(<w,x>-b)))

and b is the threshold. Without it, each node would answer the exact same output for x=0 no matter what are the weights.


I have a textual data set which has been already classified. I have 7 available classes.

I used (Waikato Environment for Knowledge Analysis) WEKA in building the model. Also with this, I have trained and tested 3 different algorithms to determine which algorithm works best for my data set.

I tried Naive Bayes, J48 and Neural Networks (SMO) which are all available in WEKA's machine learning environment.

During training and testing, found out the ranking of three algorithms in terms of accuracy with the following:

  1. Neural Networks - 98%
  2. Naive Bayes - 90%
  3. J48 - 85%

With the results above, I decided to use the Neural Networks and build the model. I created an application in JAVA and loaded the Neural Networks model built from WEKA.

However, my problem is, the model cannot predict the new data correctly. I am a bit confused as during training and testing I obtained a high accuracy but during deployment the accuracy rate is somewhat 40% only.

I tried to do this in C# and obtained the same results. Below is a sample code I used.

   Instances test = null; 
   DataSource source = new DataSource("C:\\Users\\Ian\\Desktop\\FINAL\\testdataset.arff");
   test = source.getDataSet();

   FilteredClassifier cl1 = (FilteredClassifier)"C:\\FINAL\\NeuralNetworks.model");        
   Evaluation evaluation = new Evaluation(test);      
   System.out.println("Results:" + evaluation.toSummaryString());   

    for (int i = 0; i < test.numInstances(); i++) {
        String trueClassLabel = test.instance(i).toString(test.classIndex());
        double predictionIndex =cl1.classifyInstance(test.instance(i));

        String predictedClassLabel;
        predictedClassLabel = test.classAttribute().value((int) predictionIndex );
        System.out.println((i+1) + "\t" + trueClassLabel + "\t" + predictedClassLabel);

Any advise where do you think I did wrong?


After our short chat in the comments, it seems obvious to me that you are overfitting on your training data. This is most likely caused by a neural network architecture which is too overpowered for the problem you are trying to solve. It can be shown that any function can be represented by a neural network with just enough degrees of freedom. Instead of finding a well generalizing solution, the NN memorized the training data during training, leading to a nearly perfect accuracy. But as soon as it has to deal with new data, it can not do it very well, as it didn’t find a proper generalization rule. In order to solve this problem, you have to reduce the degrees of freedom of your NN. This can be achieved by reducing the number of layers and nodes in each layer. Try to start as simple as only 1 or 2 hidden layers with very few nodes. Then, keep increasing both, nodes and layers until you reach your best performance.

Important: Always measure performance with independent test set, rather than with the same data you trained the model with.

You can find some further hints on this issue here