Hot questions for Using Neural networks in ocr

Question:

Sorry this must be probably a dumb question. but i am fairly new to machine learning and Tessaract OCR. I have heard that Tessaract OCR can be trained.

What i need to know is does Tessaract OCR uses neural networks as their default training mechanism or do we have to program it explicitly to use neural networks ?.

Sorry if i'm thinking in a wrong way about this "training" concept. but what i need to know exactly is is Tessaract already using NN or if not how i can approach using NN with tessaract OCR to improve recognition accuracy ?.

If one can please suggest me some good resources/way to refer/try and to get started it would be a great help too.

what i currently know about basic machine learning supervised training concept and to perform basic image OCR operation in Tessaract OCR.


Answer:

It appears that Tessaract uses an Adaptive Classifier by default. Check this out for a good read:

https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf

There appears to be an option called "Cube mode" where it will switch to using NNs for the learning system instead of the adaptive classifier (https://code.google.com/p/tesseract-ocr-extradocs/wiki/Cube). More info about adaptive classifiers:

http://www.cs.indiana.edu/~rawlins/website/adaptivity/information-helper.html

Also, related very closely is a Learning Classifier System:

http://en.wikipedia.org/wiki/Learning_classifier_system

Also, your terminology of "training" is very close. Training is how you teach the pattern recognition system or learning system what responses it should give to certain input sets. Then, it uses similarities when it encounters unknown data to classify the new data. Machine learning is one of the coolest fields in existence in my opinion (probably biased opinion but whatever!) keep up the learning! You are the meta learner: learning how to teach a machine to learn! Cool stuff!

Question:

I'm working my way toward understanding the usage of a NN in order to perform OCR, my goal is a bit different than the usual OCR algorithms.

My objective is to be able to determine if a specific input is a specific letter, for example I'm expecting to get the letter 'A' from the user, and I need to make sure I didn't get a different shape.

I need to be able to decide if a given input is the proper shape or not.

From what I've been reading, the are a few options here, MLP, SOM network, a back propagation network.

From what I understand, since I'm planning to create samples for each shape (letter) in order to train the network I should define a SOM network, is that correct?

I'm not sure which direction is preferred, if you could point me in the right direction that would be great.

I'm planning to use the Encog framework, not sure if that matters.


Answer:

From what you have described, SOM is not the best choice since it is an unsupervised classifier. You are specifying the class (letter) for each training example; therefore, a supervised classifier such as a multi-layer perceptron (MLP) is more appropriate.

With regard to MLP vs. back propagation network, that is a somewhat erroneous distinction. MLP is a type of artificial neural network (ANN), whereas back-propagation is a learning method. An MLP can be trained using back-propagation or via other methods (e.g., a genetic algorithm).

Question:

Say I want to recognize my characters using neural network(s). Let's cut it down to 5 letters, the binary form of image to 16x16, input + 2 layers network, unipolar function inside both layers. Momentum backpropagation is used in the process of learning.

Which of the following approaches should give the best results (where x is the number of neurons in the first layer)? By best I mean highest % of correct recognitions. Speed isn't a factor in this question.

  1. A single network, 256;x;5 - the highest value neuron wins.
  2. 5 networks, 256;x;1 - each network has it's own letter, each output is tested vs a treshold, could happen that 2 or more networks recognize an image as "their own"
  3. Same as above, but now the output from each network is normalized (if a,b,c,d,e are outputs, then a = a / (a+b+c+d+e); b = b / (a+b+c+d+e) and so on)

Answer:

2 is the worst option because "2 or more networks recognize an image as "their own"" will definitly happen many times and how you descriminate between them after that? 1 will work reasonable well. 3 is the basic idea behind softmax output function and softmax usually works best for classification tasks especially when combined with cross-entropy error function.

Question:

I could not find a better title. So this is what I want to do:

I have a picturebox that is 28x28 pixels

I am working on a AI project and my idea is to make an OCR for numbers,

I found training data but all the images are 28x28 pixels, so my ideea is to make a picturebox that size,

draw on it and feed that info to the neural network.

My problem is graphicly for the moment:

How do I make a 28x28 picturebox and somehow enlarge it but maintain the pixel count. I put the picturebox into a panel and I want the picturebox to fill the panel.

My idea would be to somehow scale it up and after drawing on it scale it back down, but how can I accomplish this?

mathematically how can this be done?

And what would be the best way to draw on that picturebox (line, fillelipse , etc) so the data could be feed into the NN (after normalizing it ofc).


Answer:

I figured out a solution , maybe it will help someone in the future:

        List<KeyValuePair<int, int>> coordonateList = new List<KeyValuePair<int, int>>();
        // drawPictureBox size is 280 x 280  (28 * 10, 28 * 10)



        private void pbImage_MouseDown(object sender, MouseEventArgs e)
        {
            mouseDown = true;

        }

        private void drawPictureBox_MouseMove(object sender, MouseEventArgs e)
        {
            if (mouseDown)
            {
                Point point = drawPictureBox.PointToClient(Cursor.Position);
                DrawPoint((point.X), (point.Y));
            }
        }

        private void drawPictureBox_MouseUp(object sender, MouseEventArgs e)
        {
            mouseDown = false;
        }

        public void DrawPoint(int x, int y)
        {

            using (Graphics g = Graphics.FromImage(bitmap))
            {
                SolidBrush brush = new SolidBrush(Color.White);
                g.FillRectangle(brush, x, y, 10, 10);
                coordonateList.Add(new KeyValuePair<int,int>(x/10,y/10));
            }
            drawPictureBox.Image = bitmap;
        }
        private void zoomImage(Bitmap bitmap)
        {
            var result = new Bitmap(28,28);
            using (Graphics g1 = Graphics.FromImage(result))
            {
                SolidBrush brush = new SolidBrush(Color.White);

                foreach (var item in coordonateList)
                {
                    g1.FillRectangle(brush, item.Key, item.Value, 1, 1);
                }
            pictureBox1.Image = result;

        }

Question:

I'm trying to implement a farsi OCR using neural networks,I am using 5000 training examples each is a 70 * 79 matrix,concretely I have a 5530 units input layer and one hidden layer(4000 units) and a 38 units output.

what training algorithm should I use for faster and better result(backprop,PSO,genetic,...)? I ran the implement using back propagation but it took a very long time,I had to cancel the process,should I use another algorithm or should I reduce my dimensions or .... ?

thanks


Answer:

Regular back-propagation is generally very slow. Try some faster variation like adding momentum or rprop.

Still expect it to be very slow, as you have over 22 million connections in just the first layer.

Question:

Least Square Error Recognition from JavaOCR finds the least mean square error from a training image pixel by pixel.

Logistic regression from this site, on the other hand, seeks to minimize the error function.

I also found a neural network here that seems to do the same.

Are the algorithms for the three just the same? Please specify the specific differences if there are any. I've been searching over the net for quite some time now.


Answer:

Yes, neural networks are equivalent to non-linear logistic regression. In fact, constructing models that minimize the error between predicted and actual labels is the high level goal of all of supervised learning. This is one of the reasons why SVM's are so popular because this goal is made explicit in their theoretical formulation.

JavaOCR does not seem to construct a model, it seems to construct n templates, where n is the size of the alphabet. It then assigns labels to inputs by selecting the label that minimizes the square error.