Hot questions for Using Neural networks in ruby


I want to train a neural network with the sine() function.

Currently I use this code and the (cerebrum gem):

require 'cerebrum'

input =
300.times do |i|
  inputH =
  sinus = Math::sin(i)
  inputH[:output] = [sinus]


network =

network.train(input, {
  error_threshold: 0.00005,
  iterations:      40000,
  log:             true,
  log_period:      1000,
  learning_rate:   0.3

res =
300.times do |i|
  result =[i])

puts "#{res}"

But it does not work, if I run the trained network I get some weird output values (instead of getting a part of the sine curve).

So, what I am doing wrong?


Cerebrum is a very basic and slow NN implementation. There are better options in Ruby, such as ruby-fann gem.

Most likely your problem is the network is too simple. You have not specified any hidden layers - it looks like the code assigns a default hidden layer with 3 neurons in it for your case.

Try something like:

network ={
  learning_rate:  0.01,
  momentum:       0.9,
  hidden_layers:  [100]

and expect it to take forever to train, plus still not be very good.

Also, your choice of 300 outputs is too broad - to the network it will look mostly like noise and it won't interpolate well between points. A neural network does not somehow figure out "oh, that must be a sine wave" and match to it. Instead it interpolates between the points - the clever bit happens when it does so in multiple dimensions at once, perhaps finding structure that you could not spot so easily with a manual inspection. To give it a reasonable chance of learning something, I suggest you give it much denser points e.g. where you currently have sinus = Math::sin(i) instead use:

sinus = Math::sin(i.to_f/10)

That's still almost 5 iterations through the sine wave. Which should hopefully be enough to prove that the network can learn an arbitrary function.


After my previous attempt, I managed to train a neural network to express the sine function. I used the ai4r Ruby gem:

require 'ai4r'
srand 1
net =[1, 60, 1])
net.learning_rate = 0.01
#net.propagation_function = lambda { |x| 1.0 / ( 1.0 + Math::exp( -x ) ) }

def normalise(x, xmin, xmax, ymin, ymax)
  xrange = xmax - xmin
  yrange = ymax - ymin
  return ymin + (x - xmin) * (yrange.to_f / xrange)

training_data =
test =
i2 = 0.0
320.times do |i|
  i2 += 0.1
  hash =
  output = Math.sin(i2.to_f)
  input = i2.to_f,[normalise(input,0.0,32.0,0.0,1.0)]),[normalise(output,-1.0,1.0,0.0,1.0)])
puts "#{test}"
puts "#{training_data}"

time =
999999.times do |i|
  error = 0.0
  training_data.each do |d|
    error+=net.train(d[:input], d[:expected_result])
  if error < 0.26
  print "Times: #{i}, error: #{error} \r"
time2 =
puts "#{time2}-#{time} = #{time2-time} Sekunden gebraucht."

serialized = Marshal.dump(net)"net.saved", "w+") { |file| file.write(serialized) }

Everything worked out fine. The network was trained in 4703.664857 seconds.

The network will be trained much faster when I normalise the input/output to a number between 0 and 1. ai4r uses a sigmoid function, so it's clear that it does not output negative values. But why do I have to normalise the input values? Does this kind of neural network only accept input values < 1?

In the sine example, is it possible to input any number as in:

Input: -10.0 -> Output: 0.5440211108893699
Input: 87654.322 -> Output: -0.6782453567239783
Input: -9878.923 -> Output: -0.9829544956991526

or do I have to define the range?


In your structure you have 60 hidden nodes after a single input. This means that each hidden node has only 1 learned weight for a total of 60 values learned. The connection from the hidden layer to the single output node likewise has 60 weights, or learned values. This gives a total of 120 possible learnable dimensions.

Image what each node in the hidden layer is capable of learning: there is a single scaling factor, then a non-linearity. Let's assume that your weights end up looking like:

[1e-10, 1e-9, 1e-8, ..., .1]

with each entry being the weight of a node in the hidden layer. Now if you pass in the number 1 to your network your hidden layer will output something to this effect:

[0, 0, 0, 0, ..., .1, .25, .5, .75, 1] (roughly speaking, not actually calculated)

Likewise if you give it something large, like: 1e10 then the first layer would give:

[0, .25, .5, .75, 1, 1, 1, ..., 1].

The weights of your hidden layer are going to learn to separate in this fashion to be able to handle a large range of inputs by scaling them to a smaller range. The more hidden nodes you have (in that first layer), the less far each node has to separate. In my example they are spaced out by a factor of ten. If you had 1000's, they would be spaced out by a factor of maybe 2.

By normalizing the input range to be between [0,1], you are restricting how far those hidden nodes need to separate before they can start giving meaningful information to the final layer. This allows for faster training (assuming your stopping condition is based on change in loss).

So to directly answer your questions: No, you do not need to normalize, but it certainly helps speed up training by reducing the variability and size of the input space.


This is a litle modified sample program I took from FANN website.

The equation I created is c = pow(a,2) + b.


#include "fann.h"

int main()
    const unsigned int num_input = 2;
    const unsigned int num_output = 1;
    const unsigned int num_layers = 4;
    const unsigned int num_neurons_hidden = 3;
    const float desired_error = (const float) 0.001;
    const unsigned int max_epochs = 500000;
    const unsigned int epochs_between_reports = 1000;

    struct fann *ann = fann_create_standard(num_layers, num_input,
        num_neurons_hidden, num_output);

    fann_set_activation_function_hidden(ann, FANN_SIGMOID_SYMMETRIC);
    fann_set_activation_function_output(ann, FANN_SIGMOID_SYMMETRIC);

    fann_train_on_file(ann, "", max_epochs,
        epochs_between_reports, desired_error);

    fann_save(ann, "");


    return 0;


#include <stdio.h>
#include "floatfann.h"

int main()
    fann_type *calc_out;
    fann_type input[2];

    struct fann *ann = fann_create_from_file("");

    input[0] = 1;
    input[1] = 1;
    calc_out = fann_run(ann, input);

    printf("sample test (%f,%f) -> %f\n", input[0], input[1], calc_out[0]);

    return 0;

I created my own dataset



f.write("100 2 1\n")

while i<100 do 
    first = rand(0..100)
    second = rand(0..100)
    third = first ** 2 + second
    string1 = "#{first} #{second}\n"
    string2 = "#{third}\n"


100 2 1
95  27
63  9 
38  53
31  84
28  56
95  80
10  19

sample data first line gives number of samples, number of inputs and last number of outputs.

But I am getting an error FANN Error 20: The number of output neurons in the ann (4196752) and data (1) don't match Epochs

What's the issue here? How does it calculate 4196752 neurons?


Here, using fann_create_standard, the function signature is fann_create_standard(num_layers, layer1_size, layer2_size, layer3_size...), whilst you are trying to use it differently:

struct fann *ann = fann_create_standard(num_layers, num_input,
        num_neurons_hidden, num_output);

you construct a network with 4 layers, but only provide data for 3. The 4196752 neurons in the output layer are likely coming from an undefined value.