Hot questions for Using Neural networks in fann

Question:

I've been working on a Q reinforcement learning implementation, where Q(π, a) is is approximated with a neural network. During trouble-shooting, I reduced the problem down to a very simple first step: train a NN to calculate atan2(y, x).

I'm using FANN for this problem, but the library is largely irrelevant as this question is more about the appropriate technique to use.

I have been struggling to teach the NN, given input = {x, y}, to calculate output = atan2(y, x).

Here is the naïve approach I have been using. It's extremely simplistic, but I'm trying to keep this simple to work up.

#include "fann.h"
#include <cstdio>
#include <random>
#include <cmath>

int main()
{
    // creates a 3 layered, densely connected neural network, 2-3-1
    fann *ann = fann_create_standard(3, 2, 3, 1);

    // set the activation functions for the layers
    fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC);
    fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);

    fann_type input[2];
    fann_type expOut[1];
    fann_type *calcOut;

    std::default_random_engine rng;
    std::uniform_real_distribution<double> unif(0.0, 1.0);
    for (int i = 0; i < 100000000; ++i) {
        input[0] = unif(rng);
        input[1] = unif(rng);

        expOut[0] = atan2(input[1], input[0]);

        // does a single incremental training round
        fann_train(ann, input, expOut);
    }


    input[0] = unif(rng);
    input[1] = unif(rng);

    expOut[0] = atan2(input[1], input[0]);
    calcOut = fann_run(ann, input);

    printf("Testing atan2(%f, %f) = %f -> %f\n", input[1], input[0], expOut[0], calcOut[0]);

    fann_destroy(ann);
    return 0;
}

Super simple, right? However, even after 100,000,000 iterations this neural network fails:

Testing atan2(0.949040, 0.756997) = 0.897493 -> 0.987712

I also tried using a linear activation function on the output layer (FANN_LINEAR). No luck. In fact, the results are much worse. After 100,000,000 iterations, we get:

Testing atan2(0.949040, 0.756997) = 0.897493 -> 7.648625

Which is even worse than when the weights were randomly initialized. How could a NN get worse after training?

I found this issue with FANN_LINEAR to be consistent with other tests. When linear output is needed (e.g. in the calculation of the Q value, which corresponds to arbitrarily large or small rewards), this approach fails miserably and error actually appears to increase with training.

So what is going on? Is using a fully-connected 2-3-1 NN inappropriate for this situation? Is a symmetric sigmoid activation function in the hidden layer inappropriate? I fail to see what else could possibly account for this error.


Answer:

The problem you are facing is normal, and the quality of your predictor won't improve by augmenting the number of iterations, you should augment the size of your NN, either by adding some layers or by augemnting the size of the hidden layer. Instead of having 2-3-1 you can try 2-256-128-1 for example. Normally that will work better. If you want have a look on this simple code I wrote in python to do the same task, and it is wroking well

import numpy as np
from numpy import arctan2

from keras.models import Sequential 
from keras.layers import Dense, InputLayer



nn_atan2 = Sequential()
nn_atan2.add(Dense(256, activation="sigmoid", input_shape=(2,)))
nn_atan2.add(Dense(128, activation="sigmoid"))
nn_atan2.add(Dense(1, activation='tanh'))

nn_atan2.compile(optimizer="adam", loss="mse")
nn_atan2.summary()

N = 100000
X = np.random.uniform(size=(N,2) )
y = arctan2(X[:,0], X[:,1])/(np.pi*0.5)

nn_atan2.fit(X,y, epochs=10, batch_size=128)

def predict(x, y):
    return float(nn_atan2.predict(np.array([[x, y]]))*(np.pi*0.5))

Runnin this code will give

Epoch 1/10
100000/100000 [==============================] - 3s 26us/step - loss: 0.0289
Epoch 2/10
100000/100000 [==============================] - 2s 24us/step - loss: 0.0104
Epoch 3/10
100000/100000 [==============================] - 2s 24us/step - loss: 0.0102
Epoch 4/10
100000/100000 [==============================] - 2s 24us/step - loss: 0.0096
Epoch 5/10
100000/100000 [==============================] - 2s 24us/step - loss: 0.0082
Epoch 6/10
100000/100000 [==============================] - 2s 23us/step - loss: 0.0051
Epoch 7/10
100000/100000 [==============================] - 2s 23us/step - loss: 0.0027
Epoch 8/10
100000/100000 [==============================] - 2s 23us/step - loss: 0.0019
Epoch 9/10
100000/100000 [==============================] - 2s 23us/step - loss: 0.0014
Epoch 10/10
100000/100000 [==============================] - 2s 23us/step - loss: 0.0010

Question:

So I started playing with FANN (http://leenissen.dk/) in order to create a simple recommendation engine.

For example,

User X has relations to records with ids [1, 2, 3]

Other users have relations to following ids:

  • User A: [1, 2, 3, 4]
  • User B: [1, 2, 3, 4]

It would be natural, then, that there's some chance user X would be interested in record with id 4 as well and that it should be the desired output of the recommendation engine.

It feels like this would be something a neural network could accomplish. However, from trying out FANN and googling around, it seems there seems to need to be some mathematical relation with the data and results. Here with ids there's none; the ids could just as well be any symbols.

Question: Is it possible to solve this kind of problem with a neural network and where should I begin to search for a solution?


Answer:

What you are looking for is some kind of recurrent neural network; a network that stores 'context' in some way or another. Examples of such networks would be LSTM and GRU. So basically, you have to input your data sequentially. Based on the context and the current input, the network will predict which label is most likely.

it seems there seems to need to be some mathematical relation with the data and results. Here with ids there's none; the ids could just as well be any symbols.

There is a definitely relation between the data and the results, and this can be expressed through weights and biases.

So how would it work? First you one-hot encoding your inputs and outputs. So basically, you want to predict which label is most likely after a set of labels that a user has already interacted with.

If you have 5 labels: A, B, C, D, E that means you will have 5 inputsand outputs: [0, 0, 0, 0, 0].

If your label is A, the array will be [1, 0, 0, 0, 0], if it's D, it will be [0, 0, 0, 1, 0].

So the key to LSTM's and GRU's that the data should be sequential. So basically, you input all the labels watched one by one. So if a user has watched A, B and C:

activate: [1,0,0,0,0] 
activate: [0,1,0,0,0]

// the output of this activation will be the next predicted label
activate: [0,0,1,0,0]
// output: [0.1, 0.3, 0.2, 0.7, 0.5], so the next label is D

And you should always train the network so that the output of INt is INt+1

Question:

I'm using FANN to use Neural Network. (Link to FANN)

I need to get the matrix of weight after trained the network, but I didn't find anything from documentation. (Link to documentation)

Do you know how get that matrix???

Thank you!


Answer:

You need to use fann_get_connection_array() function. It gives you array of struct fann_connection, and struct fann_connection has field weight, so it's what you want.

You can do something like this to print your weight matrix:

int main(void)
{
    struct fann *net;              /* your trained neural network */
    struct fann_connection *con;   /* weight matrix */
    unsigned int connum;           /* connections number */
    size_t i;

    /* Insert your net allocation and training code here */
    ...

    connum = fann_get_total_connections(net);
    if (connum == 0) {
        fprintf(stderr, "Error: connections count is 0\n");
        return EXIT_FAILURE;
    }

    con = calloc(connum, sizeof(*con));
    if (con == NULL) {
        fprintf(stderr, "Error: unable to allocate memory\n");
        return EXIT_FAILURE;
    }

    /* Get weight matrix */
    fann_get_connection_array(net, con);

    /* Print weight matrix */
    for (i = 0; i < connum; ++i) {
        printf("weight from %u to %u: %f\n", con[i].from_neuron,
               con[i].to_neuron, con[i].weight);
    }

    free(con);

    return EXIT_SUCCESS;
}

Details:

[1] fann_get_connection_array()

[2] struct fann_connection

[3] fann_type (type for weight)

Question:

As some posts suggest, I start using FANN (http://leenissen.dk/fann/index.php) to do neural network stuff. It is clean and easy to understand.

However, to avoid the over-fitting problem, I need to employ an algorithm that considers validation dataset as an auxiliary. (whats is the difference between train, validation and test set, in neural networks?). Interestingly, FANN wrote that it recommends the developer to consider the over-fitting problem (http://leenissen.dk/fann/wp/help/advanced-usage/).

Now the thing is, as far as I can see, FANN does not have any function to support this feature. The training function in FANN does not provide any arguments to pass the validation dataset in, neither. Am I correct? How do FANN users train their neural networks with validation dataset? Thanks for any help.


Answer:

You can implement this approach, i.e. dataset split, with FANN yourself, but you need to train each epoch separately, using the function fann_train_epoch.

You start with a big dataset, which you then want to split for the different steps. The tricky thing is: You split the dataset only once, and use only the fist part to adjust the weights (training as such).

Say, you want to have already your 2 datasets: Tran and Validation (like in the example you posted). You first need to store them in different files or arrays. Then, you can do the follwing:

struct fann *ann;
struct fann_train_data *dataTrain;
struct fann_train_data *dataVal;

Assuming that you have both datasets in files:

dataTrain = fann_read_train_from_file("./train.data");
dataVal = fann_read_train_from_file("./val.data");

Then, after setting all network parameters, you train and check the error on the second dataset, one epoch at a time. This is something like:

for(i = 1 ; i <= max_epochs ; i++) {
    fann_train_epoch(ann, dataTrain);
    train_error = fann_test_data(ann, dataTrain);
    val_error = fann_test_data(ann, dataVal);
    if ( val_error > last_val_error )
        break;
    last_val_error = val_error;
}

Of course, this condition is too simple and may stop your training loop too early, if the error fluctuate (as it commonly does: look plot below), but you get the general idea on how to use different datasets during training.

By the way, you may want to save these errors to plot them against the training epoch and have a look after the training ended:

Question:

I'm using the FANN Library with the given code.

#include <stdio.h>
#include "doublefann.h"
int main()
{
    const NUM_ITERATIONS = 10000;
    struct fann *ann;
    int topology[] = { 1, 4, 1 };
    fann_type d1[1] = { 0.5 };
    fann_type d2[1] = { 0.0 };
    fann_type *pres;
    int i;

    /* Create network */
    ann = fann_create_standard_array(3, topology);

    /* 
     * Train network 
     * input: 0.0 => output: 0.5
     * input: 0.5 => output: 0.0
     */
    i = NUM_ITERATIONS;
    while (--i)
    {
        fann_train(ann, d1, d2);
        fann_train(ann, d2, d1);
    }

    /* Should return 0.5 */
    pres = fann_run(ann, d2);
    printf("%f\n", pres[0]);

    /* Should return 0.0 */
    pres = fann_run(ann, d1);
    printf("%f\n", pres[0]);

    /* Destroy network */
    fann_destroy(ann);

    return 0;
}

I expected the result of the first run to be 0.5, since according to the training the output value to an input value of 0.0 shall be 0.5. Accordingly I expected the output of the second run to be 0.0.

But the result is constant 0.0 for every of these two runs.

What am I missing here?


Answer:

From this site: Try to replace doublefann.h by fann.h.