Hot questions for Using Neural networks in unsupervised learning

Top 10 Python Open Source / Neural networks / unsupervised learning

Question:

In scenario 1, I had a multi-layer sparse autoencoder that tries to reproduce my input, so all my layers are trained together with random-initiated weights. Without a supervised layer, on my data this didn't learn any relevant information (the code works fine, verified as I've already used it in many other deep neural network problems)

In scenario 2, I simply train multiple auto-encoders in a greedy layer-wise training similar to that of deep learning (but without a supervised step in the end), each layer on the output of the hidden layer of the previous autoencoder. They'll now learn some patterns (as I see from the visualized weights) separately, but not awesome, as I'd expect it from single layer AEs.

So I've decided to try if now the pretrained layers connected into 1 multi-layer AE could perform better than the random-initialized version. As you see this is same as the idea of the fine-tuning step in deep neural networks.

But during my fine-tuning, instead of improvement, the neurons of all the layers seem to quickly converge towards an all-the-same pattern and end up learning nothing.

Question: What's the best configuration to train a fully unsupervised multi-layer reconstructive neural network? Layer-wise first and then some sort of fine tuning? Why is my configuration not working?


Answer:

After some tests I've came up with a method that seems to give very good results, and as you'd expect from a 'fine-tuning' it improves the performance of all the layers:

Just like normally, during the greedy layer-wise learning phase, each new autoencoder tries to reconstruct the activations of the previous autoencoder's hidden layer. However, the last autoencoder (that will be the last layer of our multi-layer autoencoder during fine-tuning) is different, this one will use the activations of the previous layer and tries to reconstruct the 'global' input (ie the original input that was fed to the first layer).

This way when I connect all the layers and train them together, the multi-layer autoencoder will really reconstruct the original image in the final output. I found a huge improvement in the features learned, even without a supervised step.

I don't know if this is supposed to somehow correspond with standard implementations but I haven't found this trick anywhere before.

Question:

from a script on preparing data for a caffe network, the following piece of code turns an image (numpy array representing an image) into a datum object.

datum = caffe_pb2.Datum(
        channels=3,
        width=224,
        height=224,
        label=label,
        data=np.rollaxis(img, 2).tostring())

If the network were unsupervised, do you just create the object the same way but do not fill the label parameter, as shown below?

datum = caffe_pb2.Datum(
            channels=3,
            width=224,
            height=224,
            data=np.rollaxis(img, 2).tostring())

Answer:

The label of Datum is optional:

optional int32 label = 5;

Meaning oyu do not have to provide it.

Side note: Datum is a data structure used mainly for "Data" input layer and strictly speaking it is not part of the trained net. Caffe uses N-D tensors Blobs to store both data and parameters of the net.

Question:

I built and trained an unsupervised deep artificial neural network to detect high-order features from a large data set.

The data consists of daily weather measurements, and the output of the last layer of my deep net is 4 neurons wide, which hopefully represent high-order features. Now I would like to detect the probability of a very rare event (e.g. a tornado). I singled-out the data points that resulted in a tornado, but there are very few of them, about 10,000 out of 5,000,000 data points.

What's the best design for my tornado classifier?
  • create a training set made of only the 10,000 tornado data points, with a desired output of 1 each time?
  • create a training set made of all 5,000,000 data points, with desired output 0 when there is no tornado, and 1 when there is one? but that will likely never predict a tornado.
  • other solutions?

Answer:

I don't see why you are using unsupervised learning. It sounds like a purely supervised learning task.

You shouldn't throw away data for predicting rare events. If an event is very rare than of course the network will predict it has a very low probability. Because it does. This is called "bias". However the rest of the network should still try it's hardest to learn to distinguish positive and negative examples.

If you don't like that you can try a different loss function. Perhaps a loss function that punishes missing positive examples more than negative examples. Or you can change the networks bias just by adding more copies of the positive examples to the dataset.

You would be better off asking Data Science Stack Exchange btw: https://datascience.stackexchange.com/

Question:

I am following this paper to implement Oja's Learning rule in python

Oja's Learning Rule

u = 0.01
V = np.dot(self.weight , input_data.T)
print(V.shape , self.weight.shape , input_data.shape) #(625, 2) (625, 625) (2, 625)

So far, I am able to follow the paper, however on arriving at the final equation from the link, I run into numpy array dimension mismatch errors which seems to be expected. This is the code for the final equation

self.weight += u * V * (input_data.T - (V * self.weight)

If I break it down like so:

u = 0.01
V = np.dot(self.weight , input_data.T)
temp = u * V  #(625, 2)
x = input_data - np.dot(V.T , self.weight)   #(2, 625)
k = np.dot(temp , x)   #(625, 625)
self.weight = np.add(self.weight , k , casting = 'same_kind')

This clears out the dimension constraints, but the answer pattern is wrong by a stretch (I was just fixing the dimension orders knowing well the result would be incorrect). I want to know if my interpretation of the equation is correct in the first approach which seemed like the logical way to do so. Any suggestions on implementing the equation properly?


Answer:

I have implemented the rule based on this link Oja Rule. The results I get are similar to the hebbian learning rule. So I am not exactly sure on the correctness of the implementation. However posting it so anyone looking for an implementation can get few ideas and correct the code if wrong

u = 0.01
V = np.dot(self.weight , input_data.T)
i = 0

for inp in input_data:
    v = V[ : , i].reshape((n_features , 1))  #n_features is # of columns
    self.weight += (inp * v) - u * np.square(v) * self.weight
    i += 1