Hot questions for Using Neural networks in accord.net
I am trying to make my head around Accord.NET.Neuro. I need a NN library to be used for a reinforcement learning problem. Following one of the examples, I have written this small piece of code in F#:
let inputs = [| [|0.0;1.0|] ; [|1.0;1.0|] |] let inputdimension = inputs. |> Array.length let outputs = [| [|1.0|] ; [|0.0|] |] let outputdimension = outputs. |> Array.length let network = Accord.Neuro.ActivationNetwork ( SigmoidFunction (2.0) , // transfer function inputdimension, 2 , // two neuron in first layer outputdimension ) // one neuron in second layer let teacher = network |> LevenbergMarquardtLearning teacher.RunEpoch(inputs,outputs)
How can I obtain the weights from trained network object? Network does not have any weight property, as far as I can tell. Also, in order to make predictions, there is a Compute method; so -after learning- a prediction is made running:
network.Compute( [|1.0;1.0|] )
for example for a given input. I have noticed that, after several epochs, the network adapts incrementally to the desired targets (as it should be), but -for the training- one just runs
several times. Apparently this affects the network instance: how is it possible?
Weights are accessible through the layers property and then neuron.
So, for the given example,
provides an array of Layer (Layer) where each elements gives information (and data) of each hidden layer. In the example we have input, hidden, and output: so we identify the internal connections: input to hidden and hidden to output.
Suppose we want to know the weights from the input to hidden layer:
this line would return a Layer object (Accord.Neuro.Layer) which has a field Neuron. It is worthwile to look into the interface of these two objects because they represent the neural computations in Accord.Neuro. Neuro reports the weights of the specific computational unit (and their threshold).
So, a possible helper function to go through the network and get both weights and thresholds would be :
let getWeigths (n:ActivationNeuron ) = (n.Weights, n.Threshold) let getNetworkParameters (network:ActivationNetwork) = network.Layers |> Array.map ( fun layer -> layer.Neurons |> Array.map (fun neuron -> neuron :?> ActivationNeuron |> getWeigths) )
I may add some other notes, the more I go through the Accord.Neuro API.
I am training a ResilientBackpropagation Neural Network with Accord.Net to get a scoring for a set of features.
The network is very simple and has:
1 hidden layer with 3 nodes
I am training with:
- Random Initialization
- train-set 3000 examples
- validation-set 1000 examples
The Learning Curve looks on every run slightly different but this is the average case:
If I run the training 5 times with the same parameters and validate the network on my crossvalidation-set I get 5 different F1 Scores, between 88-91%. So it is very difficult to decide when to stop with training and take the final algorithm. Is this normal? So if I want to deploy I have to run the training X times and stop once I think I have reached the best results?
The neural network initializes the weights randomly and will generate different networks after training and therefor give you different performance. While the training process is deterministic, the initial values are not! You may end up in different local minimums as a result or stop in different places.
I am a programming enthusiast so please excuse me and help fill any gaps.. From what i understand good results from a neural network require the sigmoid and either learn rate or step rate (depending on training method) to be set correctly along with learning iterations.
While there is a lot of education about these values and the principal of generalization and avoiding an over fit, there doesn't seem to be much focus on their relationship with the data and network.
I've noticed that the number of samples, neurons and inputs seem to scale where these settings best land. (more or less inputs may change the iterations req for example).
Is there a mathematical way to find a good (approximate) starting point for sigmoid, learn rate, steps, iterations and the like based on known values such as samples, inputs, outputs, layers etc?
Before the deep learning explosion, one common way to determine the best number of parameters in your network was to use Bayesian regularization. Bayesian regularization is a method to avoid overfitting even if your network is larger than necessary.
Regarding the learning/step rate, the problem is that choosing a small step rate can make learning notoriously slow, while a large step rate may make your network diverge. Thus, a common technique was to use a learning method that could automatically adjust the learning rate in order to accelerate when necessary and decelerate in certain regions of the gradient.
As such, a common way to learn neural networks while taking care of both problems was to use the Levenberg-Marquardt learning algorithm with Bayesian Regularization. The Levenberg-Marquardt algorithm is an adaptive algorithm in the sense that it can adjust the learning rate after every iteration, being able to switch from Gauss-Newton updates (using second order information) back to a Gradient Descent algorithm (using only first order information) as needed.
It can also give you an estimate on the number of parameters that you really need in your network. The number of parameters is the total number of weights considering all neurons in the network. You can then use this parameter to estimate how many neurons you should be using in the first place.
This method is implemented by the MATLAB function trainbr. However, since you also included the accord-net tag, I should also say that it is implemented by the LevenbergMarquardtLearning class (you might want to use the latest alpha version in NuGet in case you are dealing with multiple output problems).