Hot questions for Using Neural networks in max pooling

Question:

gru_out = Bidirectional(GRU(hiddenlayer_num, return_sequences=True))(embedded)
#Tensor("concat_v2_8:0", shape=(?, ?, 256), dtype=float32)

I use Keras to create a GRU model.I want to gather information from all the node vectors of the GRU model, instead of the last node vector. For example, I need to get the maximum value of each vector, like the image description, but I have no idea on how to do this.


Answer:

One may use GlobalMaxPooling1D described here:

gru_out = Bidirectional(GRU(hiddenlayer_num, return_sequences=True))(embedded)
max_pooled = GlobalMaxPooling1D(gru_out)

Question:

im trying to fit the data with the following shape to the pretrained keras vgg19 model.

image input shape is (32383, 96, 96, 3) label shape is (32383, 17) and I got this error

expected block5_pool to have 4 dimensions, but got array with shape (32383, 17)

at this line

model.fit(x = X_train, y= Y_train, validation_data=(X_valid, Y_valid),
              batch_size=64,verbose=2, epochs=epochs,callbacks=callbacks,shuffle=True)

Here's how I define my model

model = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(96,96,3),classes=17)

How did maxpool give me a 2d tensor but not a 4D tensor ? I'm using the original model from keras.applications.vgg16. How can I fix this error?


Answer:

Your problem comes from VGG16(include_top=False,...) as this makes your solution to load only a convolutional part of VGG. This is why Keras is complaining that it got 2-dimensional output insted of 4-dimensional one (4 dimensions come from the fact that convolutional output has shape (nb_of_examples, width, height, channels)). In order to overcome this issue you need to either set include_top=True or add additional layers which will squash the convolutional part - to a 2d one (by e.g. using Flatten, GlobalMaxPooling2D, GlobalAveragePooling2D and a set of Dense layers - including a final one which should be a Dense with size of 17 and softmax activation function).

Question:

This question is a tough one: How can I feed a neural network, a dynamic input?

Answering this question will certainly help the advance of modern AI using deep learning for applications other than computer vision and speech recognition. I will explain this problem further for the laymen on neural networks.

Let's take this simple example for instance:

Say you need to know the probability of winning, losing or drawing in a game of "tic-tac-toe".

So my input could be a [3,3] matrix representing the state (1-You, 2-Enemy, 0-Empty):

[2. 1. 0.]  
[0. 1. 0.] 
[2. 2. 1.]

Let's assume we already have a previously trained hidden layer, a [3,1] matrix of weights:

[1.5]  
[0.5]  
[2.5]

So if we use a simple activation function that consists basically of a matrix multiply between the two y(x)=W*x we get this [3,1] matrix in the output:

[2. 1. 0.]     [1.5]     [3.5]
[0. 1. 0.]  *  [0.5]  =  [0.5]
[2. 2. 1.]     [2.5]     [6.5]

Even without a softmax function you can tell that the highest probability is of having a draw.

But what if I want this same neural network to work for a 5x5 game of tic-tac-toe?

It has the same logic as the 3x3, its just bigger. The neural network should be able to handle it

We would have something like:

[2. 1. 0. 2. 0.]
[0. 2. 0. 1. 1.]     [1.5]     [?]
[2. 1. 0. 0. 1.]  *  [0.5]  =  [?]                           IMPOSSIBLE
[0. 0. 2. 2. 1.]     [2.5]     [?]
[2. 1. 0. 2. 0.]

But this multiplication would be impossible to compute. We would have to add more layers and/or change our previously trained one and RETRAIN it, because the untrained weights (initialized with 0 in this case) would cause the neural network to fail, like so:

     input            1st Layer        output1
[2. 1. 0. 2. 0.]     [0.  0. 0.]     [6.5 0. 0.]
[0. 2. 0. 1. 1.]     [1.5 0. 0.]     [5.5 0. 0.]
[2. 1. 0. 0. 1.]  *  [0.5 0. 0.]  =  [1.5 0. 0.]
[0. 0. 2. 2. 1.]     [2.5 0. 0.]     [6.  0. 0.]
[2. 1. 0. 2. 0.]     [0.  0. 0.]     [6.5 0. 0.]

   2nd Layer           output1      final output
                     [6.5 0. 0.]
                     [5.5 0. 0.]
[0. 0. 0. 0. 0.]  *  [1.5 0. 0.]  =  [0. 0. 0.]                POSSIBLE
                     [6.  0. 0.]
                     [6.5 0. 0.]

Because we expanded the first layer and added a new layer of zero weights, our result is obviously inconclusive. If we apply a softmax function we will realize that the neural network is returning 33.3% chance for every possible outcome. We would need to train it again.

Obviously we want to create generic neural networks that can adapt to different input sizes, however I haven't thought of a solution for this problem yet! So I thought maybe stackoverflow can help. Thousands of heads think better than one. Any ideas?


Answer:

There are solutions for Convolutional Neural Networks apart from just resizing the input to a fixed size.

Spatial Pyramid Pooling allows you to train and test CNNs with variable sized images, and it does this by introducing a dynamic pooling layer, where the input can be of any size, and the output is of a fixed size, which then can be fed to the fully connected layers.

The pooling is very simple, one defines with a number of regions in each dimension (say 7x7), and then the layer splits each feature map in non-overlapping 7x7 regions and does max-pooling on each region, outputing a 49 element vector. This can also be applied at multiple scales.

Question:

I'm building a convolutional neural network with numpy, and I'm not sure that my pooling treatment of the 3D (HxWxD) input image is correct.

As an example, I have an image shaped (12x12x3) I convolve it to (6x6x3), and I want to perform max pooling such that I obtain a (3x3x3) image. To do this, I choose a filter size of (2x2) and a stride of 2.

output_size = int((conv.shape[0]-F)/S + 1)
pool = np.zeros((output_size,output_volume,3)) # pool array
for k in range(conv.shape[-1]): # loop over conv depth
    i_stride = 0 
    for i in range(output_size): 
        j_stride = 0
        for j in range(output_size):
            pool[i,j,k] = np.amax(conv[i_stride:i_stride+F,
                                                j_stride:j_stride+F,k],0)
            j_stride+=S 
        i_stride+=S

For the first channel of my convolution array conv[:,:,0] I obtain the following. Comparing this with the first channel of the max pooling array pool[:,:,0] I get. At a glance I can tell that the pooling operation is not correct, conv[0:2,0:2,0] (mostly gray) is most definitely not pool[0,0,0] (black), you'd expect it to be one of the shades of gray. So, I'm convinced that something is definitely wrong here. Either my for loop or the two comparisons I'm making are off.

If anyone can help me better understand the pooling operation over the array with 3 dimensions, that will definitely help.


Answer:

Maximum pooling produces the same depth as it's input. With that in mind we can focus on a single slice (along depth) of the input conv. For a single slice at an arbitrary index, you have a simple image of NxN dimensions. You defined your filter size 2, and stride 2. Max pooling does nothing more than iterate over the input image and get the maximum over the current "subimage".

import numpy as np

F = 2
S = 2
conv = np.array(
    [
        [
            [[.5, .1], [.1, .0], [.2, .7], [.1, .3], [.0, .1], [.3, .8]],
            [[.0, .9], [.5, .7], [.3, .1], [.9, .2], [.8, .7], [.1, .9]],
            [[.1, .8], [.1, .2], [.6, .2], [.0, .3], [.1, .3], [.0, .8]],
            [[.0, .6], [.6, .4], [.2, .8], [.6, .8], [.9, .1], [.3, .1]],
            [[.3, .9], [.7, .6], [.7, .6], [.5, .4], [.7, .2], [.8, .1]],
            [[.1, .8], [.9, .3], [.2, .7], [.8, .4], [.0, .5], [.8, .0]]
        ],
        [
            [[.1, .2], [.1, .0], [.5, .3], [.0, .4], [.0, .5], [.0, .6]],
            [[.3, .6], [.6, .4], [.1, .2], [.6, .2], [.2, .3], [.2, .4]],
            [[.2, .1], [.4, .2], [.0, .4], [.5, .6], [.7, .6], [.7, .2]],
            [[.0, .7], [.5, .3], [.4, .0], [.4, .6], [.2, .2], [.2, .7]],
            [[.0, .5], [.3, .0], [.3, .8], [.3, .2], [.6, .3], [.5, .2]],
            [[.6, .2], [.2, .5], [.5, .4], [.1, .0], [.2, .6], [.1, .8]]
        ]
    ])

number_of_images, image_height, image_width, image_depth = conv.shape
output_height = (image_height - F) // S + 1
output_width = (image_width - F) // S + 1

pool = np.zeros((number_of_images, output_height, output_width, image_depth))
for k in range(number_of_images):
    for i in range(output_height):
        for j in range(output_width):
            pool[k, i, j, :] = np.max(conv[k, i*S:i*S+F, j*S:j*S+F, :])

print(pool[0, :, :, 0])
[[0.9 0.9 0.9]
 [0.8 0.8 0.9]
 [0.9 0.8 0.8]]
print(pool[0, :, :, 1])
[[0.9 0.9 0.9]
 [0.8 0.8 0.9]
 [0.9 0.8 0.8]]
print(pool[1, :, :, 0])
[[0.6 0.6 0.6]
 [0.7 0.6 0.7]
 [0.6 0.8 0.8]]
print(pool[1, :, :, 1])
[[0.6 0.6 0.6]
 [0.7 0.6 0.7]
 [0.6 0.8 0.8]]

It's not clear to me why you're using transpose of the max row for a single element in the pool.

Question:

I'm using Theano 0.7 to create a convolutional neural net which uses max-pooling (i.e. shrinking a matrix down by keeping only the local maxima).

In order to "undo" or "reverse" the max-pooling step, one method is to store the locations of the maxima as auxiliary data, then simply recreate the un-pooled data by making a big array of zeros and using those auxiliary locations to place the maxima in their appropriate locations.

Here's how I'm currently doing it:

import numpy as np
import theano
import theano.tensor as T

minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5

# HERE is the function that I hope could be improved
def upsamplecode(encoded, auxpos):
    shp = encoded.shape
    upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
    for whichitem in range(minibatchsize):
        for whichfilt in range(numfilters):
            upsampled = T.set_subtensor(upsampled[whichitem, whichfilt, auxpos[whichitem, whichfilt, :]], encoded[whichitem, whichfilt, :])
    return upsampled


totalitems = minibatchsize * numfilters * numsamples

code = theano.shared(np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)))

auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor  # arbitrary positions within a bin
auxpos += (np.arange(4) * 5).reshape((1,1,-1)) # shifted to the actual temporal bin location
auxpos = theano.shared(auxpos.astype(np.int))

print "code:"
print code.get_value()
print "locations:"
print auxpos.get_value()
get_upsampled = theano.function([], upsamplecode(code, auxpos))
print "the un-pooled data:"
print get_upsampled()

(By the way, in this case I have a 3D tensor, and it's only the third axis that gets max-pooled. People who work with image data might expect to see two dimensions getting max-pooled.)

The output is:

code:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
locations:
[[[ 0  6 12 18]
  [ 4  5 11 17]
  [ 3  9 10 16]]

 [[ 2  8 14 15]
  [ 1  7 13 19]
  [ 0  6 12 18]]]
the un-pooled data:
[[[  0.   0.   0.   0.   0.   0.   1.   0.   0.   0.   0.   0.   2.   0.
     0.   0.   0.   0.   3.   0.]
  [  0.   0.   0.   0.   4.   5.   0.   0.   0.   0.   0.   6.   0.   0.
     0.   0.   0.   7.   0.   0.]
  [  0.   0.   0.   8.   0.   0.   0.   0.   0.   9.  10.   0.   0.   0.
     0.   0.  11.   0.   0.   0.]]

 [[  0.   0.  12.   0.   0.   0.   0.   0.  13.   0.   0.   0.   0.   0.
    14.  15.   0.   0.   0.   0.]
  [  0.  16.   0.   0.   0.   0.   0.  17.   0.   0.   0.   0.   0.  18.
     0.   0.   0.   0.   0.  19.]
  [ 20.   0.   0.   0.   0.   0.  21.   0.   0.   0.   0.   0.  22.   0.
     0.   0.   0.   0.  23.   0.]]]

This method works but it's a bottleneck, taking most of my computer's time (I think the set_subtensor calls might imply cpu<->gpu data copying). So: can this be implemented more efficiently?

I suspect there's a way to express this as a single set_subtensor() call which may be faster, but I don't see how to get the tensor indexing to broadcast properly.


UPDATE: I thought of a way of doing it in one call, by working on the flattened tensors:

def upsamplecode2(encoded, auxpos):
    shp = encoded.shape
    upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))

    add_to_flattened_indices = theano.shared(np.array([ [[(y + z * numfilters) * numsamples * upsampfactor for x in range(numsamples)] for y in range(numfilters)] for z in range(minibatchsize)], dtype=theano.config.floatX).flatten(), name="add_to_flattened_indices")

    upsampled = T.set_subtensor(upsampled.flatten()[T.cast(auxpos.flatten() + add_to_flattened_indices, 'int32')], encoded.flatten()).reshape(upsampled.shape)

    return upsampled


get_upsampled2 = theano.function([], upsamplecode2(code, auxpos))
print "the un-pooled data v2:"
ups2 = get_upsampled2()
print ups2

However, this is still not good efficiency-wise because when I run this (added on to the end of the above script) I find out that the Cuda libraries can't currently do the integer index manipulation efficiently:

ERROR (theano.gof.opt): Optimization failure due to: local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/opt.py", line 952, in local_gpu_advanced_incsubtensor1
    gpu_y = gpu_from_host(y)
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/basic_ops.py", line 133, in make_node
    dtype=x.dtype)()])
  File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/type.py", line 69, in __init__
    (self.__class__.__name__, dtype, name))
TypeError: CudaNdarrayType only supports dtype float32 for now. Tried using dtype int64 for variable None

Answer:

I don't know whether this is faster, but it may be a little more concise. See if it is useful for your case.

import numpy as np
import theano
import theano.tensor as T

minibatchsize = 2
numfilters = 3
numsamples = 4
upsampfactor = 5

totalitems = minibatchsize * numfilters * numsamples

code = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples))

auxpos = np.arange(totalitems).reshape((minibatchsize, numfilters, numsamples)) % upsampfactor 
auxpos += (np.arange(4) * 5).reshape((1,1,-1))

# first in numpy
shp = code.shape
upsampled_np = np.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled_np[np.arange(shp[0]).reshape(-1, 1, 1), np.arange(shp[1]).reshape(1, -1, 1), auxpos] = code

print "numpy output:"
print upsampled_np

# now the same idea in theano
encoded = T.tensor3()
positions = T.tensor3(dtype='int64')
shp = encoded.shape
upsampled = T.zeros((shp[0], shp[1], shp[2] * upsampfactor))
upsampled = T.set_subtensor(upsampled[T.arange(shp[0]).reshape((-1, 1, 1)), T.arange(shp[1]).reshape((1, -1, 1)), positions], encoded)

print "theano output:"
print upsampled.eval({encoded: code, positions: auxpos})

Question:

Very similar to this question but for average pooling.

The accpeted answer says, that same pooling uses -inf as padding for maxpooling. But what is used for average pooling. Do they use just 0?


Answer:

Ok I just testef it out myself.

np.set_printoptions(threshold=np.nan)
x = np.array([[[3.0,3.0,3.0],[3.0,3.0,3.0]]])
x = x.reshape(1,2,3,1)
sess = tf.Session()
K.set_session(sess)
b = K.constant(x)
b = AveragePooling2D(pool_size=(2, 2), padding="same")(b)
b = tf.Print(b,[b])
sess.run(b)

This returns the tensor [[[[3][3]]]] so it has to pad with 0.