Hot questions for Using Neural networks in opencv3.1

Question:

I'm trying to implement Multi-Layer Perceptrons (MLP) neural networks using EmguCV 3.1 (a dot NET wrapper for OpenCV library) in C#(Windows Form). In order to practice with this library I decide to implement OR operation using MLP.

I create MLP using "Initialize" method and learn it using "Train" method as below:

private void Initialize()
{
    NETWORK.SetActivationFunction(
    ANN_MLP.AnnMlpActivationFunction.SigmoidSym);

    NETWORK.SetTrainMethod(ANN_MLP.AnnMlpTrainMethod.Backprop);

    Matrix<double> layers = new Matrix<double>(new Size(4, 1));
    layers[0, 0] = 2;
    layers[0, 1] = 2;
    layers[0, 2] = 2;
    layers[0, 3] = 1;
    NETWORK.SetLayerSizes(layers);
}

private void Train()
{
    // providing data for input

    Matrix<float> input = new Matrix<float>(4, 2);
    input[0, 0] = MIN_ACTIVATION_FUNCTION; input[0, 1] = MIN_ACTIVATION_FUNCTION;
    input[1, 0] = MIN_ACTIVATION_FUNCTION; input[1, 1] = MAX_ACTIVATION_FUNCTION;
    input[2, 0] = MAX_ACTIVATION_FUNCTION; input[2, 1] = MIN_ACTIVATION_FUNCTION;
    input[3, 0] = MAX_ACTIVATION_FUNCTION; input[3, 1] = MAX_ACTIVATION_FUNCTION;

    //providing data for output
    Matrix<float> output = new Matrix<float>(4, 1);
    output[0, 0] = MIN_ACTIVATION_FUNCTION;
    output[1, 0] = MAX_ACTIVATION_FUNCTION;
    output[2, 0] = MAX_ACTIVATION_FUNCTION;
    output[3, 0] = MAX_ACTIVATION_FUNCTION;


    // mixing input and output for training
    TrainData mixedData = new TrainData(
        input,
        Emgu.CV.ML.MlEnum.DataLayoutType.RowSample,
        output);

    // stop condition = 1 million iterations
    NETWORK.TermCriteria = new MCvTermCriteria(1000000);

    // training
    NETWORK.Train(mixedData);
}

Where MIN_ACTIVATION_FUNCTION, and MAX_ACTIVATION_FUNCTION are equal to -1.7159 and 1.7159, respectively (according to OpenCV Documentation). After 1000000 iterations of (as you see in my code in stop condition), I test my network for prediction using Predict method as below:

private void Predict()
{
    Matrix<float> input = new Matrix<float>(1, 2);
    input[0, 0] = MIN_ACTIVATION_FUNCTION;
    input[0, 1] = MIN_ACTIVATION_FUNCTION;

    Matrix<float> output = new Matrix<float>(1, 1);

    NETWORK.Predict(input, output);
    MessageBox.Show(output[0, 0].ToString());

    //////////////////////////////////////////////

    input[0, 0] = MIN_ACTIVATION_FUNCTION;
    input[0, 1] = MAX_ACTIVATION_FUNCTION;

    NETWORK.Predict(input, output);
    MessageBox.Show(output[0, 0].ToString());

    //////////////////////////////////////////////

    input[0, 0] = MAX_ACTIVATION_FUNCTION;
    input[0, 1] = MIN_ACTIVATION_FUNCTION;

    NETWORK.Predict(input, output);
    MessageBox.Show(output[0, 0].ToString());

    ////////////////////////////////////////////////

    input[0, 0] = MAX_ACTIVATION_FUNCTION;
    input[0, 1] = MAX_ACTIVATION_FUNCTION;

    NETWORK.Predict(input, output);
    MessageBox.Show(output[0, 0].ToString());
}

Here is a sample of what NETWORK predicts: -0.00734469 -0.03184918 0.02080269 -0.006674092

I expect be some thing like this: -1.7 +1.7 +1.7 +1.7 What is wrong among my codes?

Note that I also use 0, 1 for MIN_ACTIVATION_FUNCTION and MAX_ACTIVATION_FUNCTION values but I still do not any good results.

Update 1: I edit my codes as first answer refers me (even I test my code with idea referenced in comments). Now I get NaN when call predict method.


Answer:

It seems that you have an error in the providing data for output. Use output array instead of input.

I think your output responses should be 2D-matrix (with 2 columns). The last layer should have 2 output neurons, because you have 2 classes for example (1, 0) is class "True" and (0, 1) is class "False". Also try to change the architecture of your network. Logic operator OR is linearly separable i.e. it can be performed using a single perceprton.

Question:

Briefly I want to use Caffe these day for my project. My OS is Ubuntu 14.04, with Opencv3.1+Python3.5+Anaconda+GPU I have already passed all:

make all
make pycaffe
make test
make runtest

However when can try to make pycaffe, it cannot pass:

Python.h: No such file or directory

Here is my 'makefile.config', and I am sure the 'Python.h' has already in the path, which make me quite confused.

USE_CUDNN := 1
OPENCV_VERSION := 3
ANACONDA_HOME := $(HOME)/anaconda3
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
         $(ANACONDA_HOME)/include/python3.5m \
         $(ANACONDA_HOME)/lib/python3.5/site-packages/numpy/core/include \
PYTHON_LIB := $(ANACONDA_HOME)/lib
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
USE_PKG_CONFIG := 1
PYTHON_LIBRARIES := boost_python3 python3.5m
PYTHON_INCLUDE := /usr/include/python3.5m \
                 /usr/lib/python3.5/dist-packages/numpy/core/include

Because I use Python3.5, so I uncomment the following:

PYTHON_INCLUDE := /usr/include/python2.7 \
        /usr/lib/python2.7/dist-packages/numpy/core/include
PYTHON_LIB := /usr/lib

I really appreciate someone could help,


Answer:

You have two definitions for PYTHON_INCLUDE: you need to decide if you go for the "python3" flavor, or the "anaconda" flavor...

Where is your python.h file anyway? try in shell

find / -name "Python.h" -type f

and see where it actually is. Then pick the correct settings for PYTHON_INCLUDE in your makefile.config

Question:

I am new in OpenCV world and neural networks but I have some coding experience in C++/Java.


I created my first ANN MLP and learned it the XOR:

#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/ml/ml.hpp>

#include <iostream>
#include <iomanip>

using namespace cv;
using namespace ml;
using namespace std;

void print(Mat& mat, int prec)
{
    for (int i = 0; i<mat.size().height; i++)
    {
        cout << "[";
        for (int j = 0; j<mat.size().width; j++)
        {
            cout << fixed << setw(2) << setprecision(prec) << mat.at<float>(i, j);
            if (j != mat.size().width - 1)
                cout << ", ";
            else
                cout << "]" << endl;
        }
    }
}

int main()
{
    const int hiddenLayerSize = 4;
    float inputTrainingDataArray[4][2] = {
        { 0.0, 0.0 },
        { 0.0, 1.0 },
        { 1.0, 0.0 },
        { 1.0, 1.0 }
    };
    Mat inputTrainingData = Mat(4, 2, CV_32F, inputTrainingDataArray);

    float outputTrainingDataArray[4][1] = {
        { 0.0 },
        { 1.0 },
        { 1.0 },
        { 0.0 }
    };
    Mat outputTrainingData = Mat(4, 1, CV_32F, outputTrainingDataArray);

    Ptr<ANN_MLP> mlp = ANN_MLP::create();

    Mat layersSize = Mat(3, 1, CV_16U);
    layersSize.row(0) = Scalar(inputTrainingData.cols);
    layersSize.row(1) = Scalar(hiddenLayerSize);
    layersSize.row(2) = Scalar(outputTrainingData.cols);
    mlp->setLayerSizes(layersSize);

    mlp->setActivationFunction(ANN_MLP::ActivationFunctions::SIGMOID_SYM);

    TermCriteria termCrit = TermCriteria(
        TermCriteria::Type::COUNT + TermCriteria::Type::EPS,
        100000000,
        0.000000000000000001
    );
    mlp->setTermCriteria(termCrit);

    mlp->setTrainMethod(ANN_MLP::TrainingMethods::BACKPROP);

    Ptr<TrainData> trainingData = TrainData::create(
        inputTrainingData,
        SampleTypes::ROW_SAMPLE,
        outputTrainingData
    );

    mlp->train(trainingData
        /*, ANN_MLP::TrainFlags::UPDATE_WEIGHTS
        + ANN_MLP::TrainFlags::NO_INPUT_SCALE
        + ANN_MLP::TrainFlags::NO_OUTPUT_SCALE*/
    );

    for (int i = 0; i < inputTrainingData.rows; i++) {
        Mat sample = Mat(1, inputTrainingData.cols, CV_32F, inputTrainingDataArray[i]);
        Mat result;
        mlp->predict(sample, result);
        cout << sample << " -> ";// << result << endl;
        print(result, 0);
        cout << endl;
    }

    return 0;
}

It works very well for this simple problem, I also learn this network the 1-10 to binary conversion.


But i need to use MLP for simple image classification - road signs. I write the code for loading training images and preparing matrix for learning but I'm not able to train the network - it "learn" in one second even with 1 000 000 iterations! And it produce garbage results, the same for all inputs!


Here are my test images and the source code:

#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/ml/ml.hpp>

#include <iostream>
#include <chrono>
#include <memory>
#include <iomanip>
#include <climits>

#include <Windows.h>

using namespace cv;
using namespace ml;
using namespace std;
using namespace chrono;

const int WIDTH_SIZE = 50;
const int HEIGHT_SIZE = (int)(WIDTH_SIZE * sqrt(3)) / 2;
const int IMAGE_DATA_SIZE = WIDTH_SIZE * HEIGHT_SIZE;

void print(Mat& mat, int prec)
{
    for (int i = 0; i<mat.size().height; i++)
    {
        cout << "[ ";
        for (int j = 0; j<mat.size().width; j++)
        {
            cout << fixed << setw(2) << setprecision(prec) << mat.at<float>(i, j);
            if (j != mat.size().width - 1)
                cout << ", ";
            else
                cout << " ]" << endl;
        }
    }
}

bool loadImage(string imagePath, Mat& outputImage)
{
    // load image in grayscale
    Mat image = imread(imagePath, IMREAD_GRAYSCALE);
    Mat temp;

    // check for invalid input
    if (image.empty()) {
        cout << "Could not open or find the image" << std::endl;
        return false;
    }

    // resize the image
    Size size(WIDTH_SIZE, HEIGHT_SIZE);
    resize(image, temp, size, 0, 0, CV_INTER_AREA);

    // convert to float 1-channel
    temp.convertTo(outputImage, CV_32FC1, 1.0/255.0);

    return true;
}

vector<string> getFilesNamesInFolder(string folder)
{
    vector<string> names;
    char search_path[200];
    sprintf(search_path, "%s/*.*", folder.c_str());
    WIN32_FIND_DATA fd;
    HANDLE hFind = ::FindFirstFile(search_path, &fd);
    if (hFind != INVALID_HANDLE_VALUE) {
        do {
            // read all (real) files in current folder
            // , delete '!' read other 2 default folder . and ..
            if (!(fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
                names.push_back(fd.cFileName);
            }
        } while (::FindNextFile(hFind, &fd));
        ::FindClose(hFind);
    }
    return names;
}

class Sign {
public:
    enum class Category { A = 'A', B = 'B', C = 'C', D = 'D' };

    Mat image;
    Category category;
    int number;

    Sign(Mat& image, string name) :image(image) {
        category = static_cast<Category>(name.at(0));
        number = stoi(name.substr(2, name.length()));
    };
};

vector<Sign> loadSignsFromFolder(String folderName) {
    vector<Sign> roadSigns;

    for (string fileName : getFilesNamesInFolder(folderName)) {
        Mat image;
        loadImage(folderName + fileName, image);
        roadSigns.emplace_back(image, fileName.substr(0, (fileName.length() - 4))); //cut .png
    }

    return roadSigns;
}

void showSignsInWindows(vector<Sign> roadSigns) {
    for (Sign sign : roadSigns) {
        String windowName = "Sign " + to_string(sign.number);
        namedWindow(windowName, WINDOW_AUTOSIZE);
        imshow(windowName, sign.image);
    }
    waitKey(0);
}

Mat getInputDataFromSignsVector(vector<Sign> roadSigns) {
    Mat roadSignsImageData;

    for (Sign sign : roadSigns) {
        Mat signImageDataInOneRow = sign.image.reshape(0, 1);
        roadSignsImageData.push_back(signImageDataInOneRow);
    }

    return roadSignsImageData;
}

Mat getOutputDataFromSignsVector(vector<Sign> roadSigns) {
    int signsCount = (int) roadSigns.size();
    int signsVectorSize = signsCount + 1;

    Mat roadSignsData(0, signsVectorSize, CV_32FC1);

    int i = 1;
    for (Sign sign : roadSigns) {
        vector<float> outputTraningVector(signsVectorSize);
        fill(outputTraningVector.begin(), outputTraningVector.end(), -1.0);
        outputTraningVector[i++] = 1.0;

        Mat tempMatrix(outputTraningVector, false);
        roadSignsData.push_back(tempMatrix.reshape(0, 1));
    }

    return roadSignsData;
}

int main(int argc, char* argv[])
{
    if (argc != 2) {
        cout << " Usage: display_image ImageToLoadAndDisplay" << endl;
        return -1;
    }

    const int hiddenLayerSize = 500;

    vector<Sign> roadSigns = loadSignsFromFolder("../../../Znaki/A/");
    Mat inputTrainingData = getInputDataFromSignsVector(roadSigns);
    Mat outputTrainingData = getOutputDataFromSignsVector(roadSigns);

    Ptr<ANN_MLP> mlp = ANN_MLP::create();

    Mat layersSize = Mat(3, 1, CV_16U);
    layersSize.row(0) = Scalar(inputTrainingData.cols);
    layersSize.row(1) = Scalar(hiddenLayerSize);
    layersSize.row(2) = Scalar(outputTrainingData.cols);
    mlp->setLayerSizes(layersSize);

    mlp->setActivationFunction(ANN_MLP::ActivationFunctions::SIGMOID_SYM, 1.0, 1.0);

    mlp->setTrainMethod(ANN_MLP::TrainingMethods::BACKPROP, 0.05, 0.05);
    //mlp->setTrainMethod(ANN_MLP::TrainingMethods::RPROP);

    TermCriteria termCrit = TermCriteria(
        TermCriteria::Type::MAX_ITER //| TermCriteria::Type::EPS,
        ,100 //(int) INT_MAX
        ,0.000001
    );
    mlp->setTermCriteria(termCrit);

    Ptr<TrainData> trainingData = TrainData::create(
        inputTrainingData,
        SampleTypes::ROW_SAMPLE,
        outputTrainingData
    );

    auto start = system_clock::now();
    mlp->train(trainingData
        //, //ANN_MLP::TrainFlags::UPDATE_WEIGHTS
        , ANN_MLP::TrainFlags::NO_INPUT_SCALE
        + ANN_MLP::TrainFlags::NO_OUTPUT_SCALE
    );
    auto duration = duration_cast<milliseconds> (system_clock::now() - start);
    cout << "Training time: " << duration.count() << "ms" << endl;

    for (int i = 0; i < inputTrainingData.rows; i++) {
        Mat result;
        //mlp->predict(inputTrainingData.row(i), result);
        mlp->predict(roadSigns[i].image.reshape(0, 1), result);
        //cout << result << endl;
        print(result, 2);
    }


    //showSignsInWindows(roadSigns);
    return 0;
}

What is wrong in this code, that XOR works but images not? I cheked the input and output matrix and they're correct... could somebody also explain me when to/shoud I use the ANN_MLP::TrainFlags::NO_INPUT_SCALE and ANN_MLP::TrainFlags::NO_OUTPUT_SCALE or what values of setActivationFunction and setTrainMethod parameters should I use?


Thanks!


Answer:

There was a problem in backprop weight scale parameter - it was too big and the ANN couldn't learn more difficult things.


I changed the line to mlp->setTrainMethod(ANN_MLP::TrainingMethods::BACKPROP, 0.0001); and the hidden layer size to 100 (to speed up the learning) - now it's working!