## Hot questions for Using Neural networks in keras 2

Question:

I am training a keras sequential model. I want the learning rate to be reduced when training is not progressing.

I use ReduceLROnPlateau callback.

After first 2 epoch with out progress, the learning rate is reduced as expected. But then its reduced every 2 epoch's, causing the training to stop progressing.

Is that a keras bug ? or I use the function the wrong way ?

The code:

earlystopper = EarlyStopping(patience=8, verbose=1) checkpointer = ModelCheckpoint(filepath = 'model_zero7.{epoch:02d}-{val_loss:.6f}.hdf5', verbose=1, save_best_only=True, save_weights_only = True) reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.000001, verbose=1) history_zero7 = model_zero.fit_generator(bach_gen_only1, validation_data = (v_im, v_lb), steps_per_epoch=25,epochs=100, callbacks=[earlystopper, checkpointer, reduce_lr])

The output:

Epoch 00006: val_loss did not improve from 0.68605 Epoch 7/100 25/25 [==============================] - 213s 9s/step - loss: 0.6873 - binary_crossentropy: 0.0797 - dice_coef_loss: -0.8224 - jaccard_distance_loss_flat: 0.2998 - val_loss: 0.6865 - val_binary_crossentropy: 0.0668 - val_dice_coef_loss: -0.8513 - val_jaccard_distance_loss_flat: 0.2578 Epoch 00007: val_loss did not improve from 0.68605 Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.000200000009499. Epoch 8/100 25/25 [==============================] - 214s 9s/step - loss: 0.6865 - binary_crossentropy: 0.0648 - dice_coef_loss: -0.8547 - jaccard_distance_loss_flat: 0.2528 - val_loss: 0.6860 - val_binary_crossentropy: 0.0694 - val_dice_coef_loss: -0.8575 - val_jaccard_distance_loss_flat: 0.2485 Epoch 00008: val_loss improved from 0.68605 to 0.68598, saving model to model_zero7.08-0.685983.hdf5 Epoch 9/100 25/25 [==============================] - 208s 8s/step - loss: 0.6868 - binary_crossentropy: 0.0624 - dice_coef_loss: -0.8554 - jaccard_distance_loss_flat: 0.2518 - val_loss: 0.6860 - val_binary_crossentropy: 0.0746 - val_dice_coef_loss: -0.8527 - val_jaccard_distance_loss_flat: 0.2557 Epoch 00009: val_loss improved from 0.68598 to 0.68598, saving model to model_zero7.09-0.685982.hdf5 Epoch 00009: ReduceLROnPlateau reducing learning rate to 4.00000018999e-05. Epoch 10/100 25/25 [==============================] - 211s 8s/step - loss: 0.6865 - binary_crossentropy: 0.0640 - dice_coef_loss: -0.8570 - jaccard_distance_loss_flat: 0.2493 - val_loss: 0.6859 - val_binary_crossentropy: 0.0630 - val_dice_coef_loss: -0.8688 - val_jaccard_distance_loss_flat: 0.2311 Epoch 00010: val_loss improved from 0.68598 to 0.68589, saving model to model_zero7.10-0.685890.hdf5 Epoch 11/100 25/25 [==============================] - 211s 8s/step - loss: 0.6869 - binary_crossentropy: 0.0610 - dice_coef_loss: -0.8580 - jaccard_distance_loss_flat: 0.2480 - val_loss: 0.6859 - val_binary_crossentropy: 0.0681 - val_dice_coef_loss: -0.8616 - val_jaccard_distance_loss_flat: 0.2422 Epoch 00011: val_loss improved from 0.68589 to 0.68589, saving model to model_zero7.11-0.685885.hdf5 Epoch 12/100 25/25 [==============================] - 210s 8s/step - loss: 0.6866 - binary_crossentropy: 0.0575 - dice_coef_loss: -0.8612 - jaccard_distance_loss_flat: 0.2426 - val_loss: 0.6858 - val_binary_crossentropy: 0.0636 - val_dice_coef_loss: -0.8679 - val_jaccard_distance_loss_flat: 0.2325 Epoch 00012: val_loss improved from 0.68589 to 0.68585, saving model to model_zero7.12-0.685847.hdf5 Epoch 00012: ReduceLROnPlateau reducing learning rate to 8.0000005255e-06.

The first 6 epoch:

Epoch 1/100 25/25 [==============================] - 254s 10s/step - loss: 0.6886 - binary_crossentropy: 0.1356 - dice_coef_loss: -0.7302 - jaccard_distance_loss_flat: 0.4151 - val_loss: 0.6867 - val_binary_crossentropy: 0.1013 - val_dice_coef_loss: -0.8161 - val_jaccard_distance_loss_flat: 0.3096 Epoch 00001: val_loss improved from inf to 0.68673, saving model to model_zero7.01-0.686732.hdf5 Epoch 2/100 25/25 [==============================] - 211s 8s/step - loss: 0.6871 - binary_crossentropy: 0.0805 - dice_coef_loss: -0.8274 - jaccard_distance_loss_flat: 0.2932 - val_loss: 0.6865 - val_binary_crossentropy: 0.1005 - val_dice_coef_loss: -0.8100 - val_jaccard_distance_loss_flat: 0.3183 Epoch 00002: val_loss improved from 0.68673 to 0.68653, saving model to model_zero7.02-0.686533.hdf5 Epoch 3/100 25/25 [==============================] - 214s 9s/step - loss: 0.6871 - binary_crossentropy: 0.0778 - dice_coef_loss: -0.8268 - jaccard_distance_loss_flat: 0.2934 - val_loss: 0.6863 - val_binary_crossentropy: 0.0811 - val_dice_coef_loss: -0.8402 - val_jaccard_distance_loss_flat: 0.2743 Epoch 00003: val_loss improved from 0.68653 to 0.68635, saving model to model_zero7.03-0.686345.hdf5 Epoch 4/100 25/25 [==============================] - 210s 8s/step - loss: 0.6869 - binary_crossentropy: 0.0692 - dice_coef_loss: -0.8397 - jaccard_distance_loss_flat: 0.2749 - val_loss: 0.6862 - val_binary_crossentropy: 0.0820 - val_dice_coef_loss: -0.8445 - val_jaccard_distance_loss_flat: 0.2682 Epoch 00004: val_loss improved from 0.68635 to 0.68621, saving model to model_zero7.04-0.686206.hdf5 Epoch 5/100 25/25 [==============================] - 208s 8s/step - loss: 0.6868 - binary_crossentropy: 0.0693 - dice_coef_loss: -0.8446 - jaccard_distance_loss_flat: 0.2676 - val_loss: 0.6861 - val_binary_crossentropy: 0.0761 - val_dice_coef_loss: -0.8495 - val_jaccard_distance_loss_flat: 0.2606 Epoch 00005: val_loss improved from 0.68621 to 0.68605, saving model to model_zero7.05-0.686055.hdf5 Epoch 6/100 25/25 [==============================] - 203s 8s/step - loss: 0.6874 - binary_crossentropy: 0.0792 - dice_coef_loss: -0.8200 - jaccard_distance_loss_flat: 0.3024 - val_loss: 0.6865 - val_binary_crossentropy: 0.0559 - val_dice_coef_loss: -0.8716 - val_jaccard_distance_loss_flat: 0.2269 Epoch 00006: val_loss did not improve from 0.68605

Answer:

Well it is a bug in keras. https://github.com/keras-team/keras/issues/3991

To solve it, use: cooldown=1

Question:

Here is part of `get_updates`

code from `SGD`

from `keras`

(source)

moments = [K.zeros(shape) for shape in shapes] self.weights = [self.iterations] + moments for p, g, m in zip(params, grads, moments): v = self.momentum * m - lr * g # velocity self.updates.append(K.update(m, v))

##### Observation:

Since `moments`

variable is a list of zeros tensors. Each `m`

in the `for loop`

is a zero tensor with the shape of `p`

. Then the `self.momentum * m`

, at the first line of the loop, is just a scalar multiply by zero tensor which result a zero tensor.

##### Question

What am I missing here? Thanks!

Answer:

Yes - during a first iteration of this loop `m`

is equal to 0. But then it's updated by a current `v`

value in this line:

self.updates.append(K.update(m, v))

So in next iteration you'll have:

v = self.momentum * old_velocity - lr * g # velocity

where `old_velocity`

is a previous value of `v`

.

Question:

Every derived class of Layer class in keras has `build()`

definition.

`build()`

is place where we assign weights to the keras layer.

When is this function invoked internally? I am unable to find any piece of code which may be callling it

In `__call_()`

of Layer class at topology.py:580, we call `self.build()`

but it will be invoked only when `self.built = True`

. That is always set in `self.build()`

which in turn will be invoked only when self.built is True.

Answer:

You've missed `not`

in the condition (source code):

if not self.built: ... if len(input_shapes) == 1: self.build(input_shapes[0]) else: self.build(input_shapes)

... which basically means "build if not already".

By the way, `build()`

is also called in `count_params()`

method, again with a guard (source code).

Question:

I'm training autoencoders on 2D images using convolutional layers and would like to put fully connected layers on top of encoder part for classification. My autoencoder is defined as follows (just a simple one for illustration):

def encoder(input_img): conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) conv1 = BatchNormalization()(conv1) pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1) conv2 = BatchNormalization()(conv2) return conv2 def decoder(conv2): conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv2) conv3 = BatchNormalization()(conv3) up1 = UpSampling2D((2,2))(conv3) decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up1) return decoded autoencoder = Model(input_img, decoder(encoder(input_img)))

My input images are of size (64,80,1). Now when stacking fully connected layers on top of the encoder I'm doing the following:

def fc(enco): flat = Flatten()(enco) den = Dense(128, activation='relu')(flat) out = Dense(num_classes, activation='softmax')(den) return out encode = encoder(input_img) full_model = Model(input_img,fc(encode)) for l1,l2 in zip(full_model.layers[:19],autoencoder.layers[0:19]): l1.set_weights(l2.get_weights())

For only one autoencoder this works but the problem now is that I have 2 autoencoders trained on sets of images all of size (64, 80, 1).

For every label I have as input two images of size (64, 80, 1) and one label (0 or 1). I need to feed image 1 into the first autoencoder and image 2 into the second autoencoder. But how can I combine both autoencoders in the `full_model`

in above code?

Another problem is also the input to the `fit()`

method. Until now with only one autoencoder the input consisted just of numpy arrays of images (e.g. (1000,64,80,1)) but with two autoencoders I would have two sets of images as input. How can I feed this into the `fit()`

method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?

Answer:

**Q: How can I combine both autoencoders in full_model?**

A: You could concatenate the bottleneck layers `enco_1`

and `enco_2`

of both autoencoders within `fc`

:

def fc(enco_1, enco_2): flat_1 = Flatten()(enco_1) flat_2 = Flatten()(enco_2) flat = Concatenate()([enco_1, enco_2]) den = Dense(128, activation='relu')(flat) out = Dense(num_classes, activation='softmax')(den) return out encode_1 = encoder_1(input_img_1) encode_2 = encoder_2(input_img_2) full_model = Model([input_img_1, input_img_2], fc(encode_1, encode_2))

Note that the last part where you manually set the weights of the encoder is unnecessary - see https://keras.io/getting-started/functional-api-guide/#shared-layers

**Q: How can I feed this into the fit method so that the first autoencoder consumes the first set of images and the second autoencoder the second set?**

A: In the code above, note that the two encoders are fed with different inputs (one for each image set). Now, provided that the model is defined in this way, you can call `full_model.fit`

as follows:

full_model.fit(x=[images_set_1, images_set_2], y=label, ...)

**NOTE:** Not tested.