Hot questions for Using Neural networks in self organizing maps
I have a question on self-organizing maps:
But first, here is my approach on implementing one:
The som neurons are stored in a basic array. Each neuron consists of a vector (another array of the size of the input neurons) of double values which are initialized to a random value.
As far as I understand the algorithm, this is actually all I need to implement it.
So, for the training I choose a sample of the training data at random an calculate the BMU using the Euclidian distance of sample's values and the neuron weights.
Afterwards I update it's weights and all other neurons in it's range depending on the neighborhood function and the learning rate.
Then, I decrease the neighborhood function and the learning rate.
This is done until a fixed amount of iterations.
My question is now: How do I determine the clusters after the training? My approach so far is to present a new input vector and calculate the min Euclidian distance between it and the BMU . But this seems a little naive to me. I'm sure that I've missed something.
There is no single correct way of doing that. As you noted, finding the BMU is one of them and the only one that makes sense if you just want to find the most similar cluster.
If you want to reconstruct your input vector, returning the BMU prototype works too, but may not be very precise (it is equivalent to the Nearest Neighbor rule or 1NN). Then you need to interpolate between neurons to find a better reconstruction. This could be done by weighting each neuron inversely proportional to their distance to the input vector and then computing the weighted average (this is equivalent to weighted KNN). You can also restrict this interpolation only to the BMU's neighbors, which will work faster and may give better results (this would be weighted 5NN). This technique was used here: The Continuous Interpolating Self-organizing Map.
You can see and experiment with those different options here: http://www.inf.ufrgs.br/~rcpinto/itm/ (not a SOM, but a close cousin). Click "Apply" to do regression on a curve using the reconstructed vectors, then check "Draw Regression" and try the different options.
BTW, the description of your implementation is correct.
How do I get a n-by-2 vector that contains the connections of the neurons in an SOM? For example, if I have a simple 2x2 hextop SOM, the connections vector should look like:
This vector indicates that the neuron 1 is connected to neuron 2, neuron 1 is connected to neuron 3, etc.
How can this connections vector be retrieved from any given SOM?
Assuming the SOM is defined with neighbourhood distance 1 (i.e., for each neuron, edges to all neurons within an Euclidian distance of 1), the default option for Matlabs
hextop(...) command, you can create your connections vector as follows:
pos = hextop(2,2); % Find neurons within a Euclidean distance of 1, for each neuron. % option A: count edges only once distMat = triu(dist(pos)); [I, J] = find(distMat > 0 & distMat <= 1); connectionsVectorA = [I J] % option B: count edges in both directions distMat = dist(pos); [I, J] = find(distMat > 0 & distMat <= 1); connectionsVectorB = sortrows([I J]) % verify graphically plotsom(pos)
The output from the above follows:
connectionsVectorA = 1 2 1 3 2 3 2 4 3 4 connectionsVectorB = 1 2 1 3 2 1 2 3 2 4 3 1 3 2 3 4 4 2 4 3
If you have a SOM with a non-default neighbourhood distance (
!= 1), say
nDist, just replace the
find(..) commands above with
... find(distMat > 0 & distMat <= nDist);