Hot questions for Using Neural networks in batch processing

Question:

I am currently trying to implement a model using Batch Normalization in Keras. I have succesfully implemented it for the training phase.

However, for testing, Batch Normalization calculates the statistics (mean and variance) of the entire population before doing a forward pass through the network (the BN mean and variance are pre-calculated, and then kept static; this is in contrast to the training phase, where the mean and variance are determined by the batch).

My question regarding Keras is:

Assume (X, y) is the entire population. Assume (X_batch, y_batch) is a batch (a subset of that entire population)

If I use the

model.test_on_batch(X_batch, y_batch)

how can I pass on to the batch-normalization layer the mean and variance of the entire population for X and y? Is there any way I can let keras handle this automatically?


Answer:

how can I pass on to the batch-normalization layer the mean and variance of the entire population for X and y? Is there any way I can let keras handle this automatically?

Keras should just do it (in sufficiently recent versions):

https://github.com/fchollet/keras/issues/81

To double-check, you may want to try batch_size=1 at test/prediction time, and if Keras fails to use the global statistics, you'll probably see very bad results.

Question:

I'm trying to implement a simple NN in Torch to learn more about it. I created a very simple dataset: binary numbers from 0 to 15 and my goal is to classify the numbers into two classes - class 1 are numbers 0-3 and 12-15, class 2 are the remaining ones. The following code is what i have now (i have removed the data loading routine only):

require 'torch'
require 'nn'

data = torch.Tensor( 16, 4 )
class = torch.Tensor( 16, 1 )

network = nn.Sequential()

network:add( nn.Linear( 4, 8 ) )
network:add( nn.ReLU() )
network:add( nn.Linear( 8, 2 ) )
network:add( nn.LogSoftMax() )

criterion = nn.ClassNLLCriterion()

for i = 1, 300 do
        prediction = network:forward( data )

        --print( "prediction: " .. tostring( prediction ) )
        --print( "class: " .. tostring( class ) )

        loss = criterion:forward( prediction, class )

        network:zeroGradParameters()

        grad = criterion:backward( prediction, class )
        network:backward( data, grad )

        network:updateParameters( 0.1 )
end

This is how the data and class Tensors look like:

 0  0  0  0
 0  0  0  1
 0  0  1  0
 0  0  1  1
 0  1  0  0
 0  1  0  1
 0  1  1  0
 0  1  1  1
 1  0  0  0
 1  0  0  1
 1  0  1  0
 1  0  1  1
 1  1  0  0
 1  1  0  1
 1  1  1  0
 1  1  1  1
[torch.DoubleTensor of size 16x4]

 2
 2
 2
 2
 1
 1
 1
 1
 1
 1
 1
 1
 2
 2
 2
 2
[torch.DoubleTensor of size 16x1]

Which is what I expect it to be. However when running this code, i get the following error on line loss = criterion:forward( prediction, class ):

torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:69: attempt to perform arithmetic on a nil value

When i modify the training routine like this (processing a single data point at a time instead of all 16 in a batch) it works and the network successfully learns to recognize the two classes:

for k = 1, 300 do
for i = 1, 16 do
        prediction = network:forward( data[i] )

        --print( "prediction: " .. tostring( prediction ) )
        --print( "class: " .. tostring( class ) )

        loss = criterion:forward( prediction, class[i] )

        network:zeroGradParameters()

        grad = criterion:backward( prediction, class[i] )
        network:backward( data[i], grad )

        network:updateParameters( 0.1 )
end
end

I'm not sure what might be wrong with the "batch processing" i'm trying to do. A brief look at the ClassNLLCriterion didn't help, it seems i'm giving it the expected input (see below), but it still fails. The input it receives (prediction and class Tensors) looks like this:

-0.9008 -0.5213
-0.8591 -0.5508
-0.9107 -0.5146
-0.8002 -0.5965
-0.9244 -0.5055
-0.8581 -0.5516
-0.9174 -0.5101
-0.8040 -0.5934
-0.9509 -0.4884
-0.8409 -0.5644
-0.8922 -0.5272
-0.7737 -0.6186
-0.9422 -0.4939
-0.8405 -0.5648
-0.9012 -0.5210
-0.7820 -0.6116
[torch.DoubleTensor of size 16x2]

 2
 2
 2
 2
 1
 1
 1
 1
 1
 1
 1
 1
 2
 2
 2
 2
[torch.DoubleTensor of size 16x1]

Can someone help me out here? Thanks.


Answer:

Experience has shown that nn.ClassNLLCriterion expects target to be a 1D tensor of size batch_size or a scalar. Your class is a 2D one (batch_size x 1) but class[i] is 1D, that's why your non-batch version works.

So, this will solve your problem:

class = class:view(-1)

Alternatively, you can replace

network:add( nn.LogSoftMax() )
criterion = nn.ClassNLLCriterion()

with the equivalent:

criterion = nn.CrossEntropyCriterion()

The interesting thing is that nn.CrossEntropyCriterion is also able to take a 2D tensor. Why is nn.ClassNLLCriterion not?

Question:

I'd like to use the Caffe library to extract image features but I'm having performance issues. I can only use the CPU mode. I was told Caffe supported batch processing mode, in which the average time required to process one image was much slower.

I'm calling the following method:

const vector<Blob<Dtype>*>& 
Net::Forward(const vector<Blob<Dtype>* > & bottom, Dtype* loss = NULL);

and I'm putting in a vector of size 1, containing a single blob of the following dimensions - (num: 10, channels: 3, width: 227, height: 227). It represents a single image oversampled in the same way as in the official python wrapper.

This works and gives correct results. It is, however, too slow.

Whenever I try to send in a vector containing more than one blob (of the same dimensions), I get the following error:

F0910 16:10:14.848492 15615 blob.cpp:355] Trying to copy blobs of different sizes. Check failure stack trace:

How do I make Caffe process my images in a batch?


Answer:

If you want to feed larger batches you need the first (and only) blob in bottom to have num>10. Feeding a blob with num=20 is the same as feeding two inputs with oversample=10. You will, of course, have to perform the averaging manually according to the oversampling you are using.

Furthermore, you might want to change the first input dimension in your deploy.prototxt file from 10 to some larger value (depending on your machine's memory capacity)

Question:

I was reading through Keras documentation on their site ([https://keras.io/getting-started/faq/]), and I noticed for their definition of batch it says that they run each sample within a batch in parallel. For almost any type of neural network this would be completely acceptable, but if I'm running an RNN with stateful set to the default of False, does this imply that the hidden state is being reset for each and every one of my samples.

I was under the impression that each batch was being run sequentially before an update to the weights was made, and therefore the only loss of hidden states was when the batches changed (since I have stateful set to False).

Am I wrong in my understanding?


Answer:

Every sample is an individual sequence. And a state (the condition in which a sequence is at the current timestep) only makes sense for each sequence individually.

One sequence cannot affect the state of another sequence.

So, there is a parallel state for each sequence in the batch.

In a stateful layer, these parallel states will be kept (the sequences have not ended until you say so).

Here is another related question: When does keras reset an LSTM state?