Hot questions for Using Neural networks in transfer learning


I am looking for a way to re initialize layer's weights in an existing keras pre trained model.

I am using python with keras and need to use transfer learning, I use the following code to load the pre trained keras models

from keras.applications import vgg16, inception_v3, resnet50, mobilenet
vgg_model = vgg16.VGG16(weights='imagenet')

I read that when using a dataset that is very different than the original dataset it might be beneficial to create new layers over the lower level features that we have in the trained net.

I found how to allow fine tuning of parameters and now I am looking for a way to reset a selected layer for it to re train. I know I can create a new model and use layer n-1 as input and add layer n to it, but I am looking for a way to reset the parameters in an existing layer in an existing model.


For whatever reason you may want to re-initalize the weights of a single layer k, here is a general way to do it:

from keras.applications import vgg16
from keras import backend as K

vgg_model = vgg16.VGG16(weights='imagenet')
sess = K.get_session()

initial_weights = vgg_model.get_weights()

from keras.initializers import glorot_uniform  # Or your initializer of choice

k = 30 # say for layer 30
new_weights = [glorot_uniform()(initial_weights[i].shape).eval(session=sess) if i==k else initial_weights[i] for i in range(len(initial_weights))]


You can easily verify that initial_weights[k]==new_weights[k] returns an array of False, while initial_weights[i]==new_weights[i] for any other i returns an array of True.


I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.

I know some of the most well-known ones are: VGG Net ResNet Dense Net Inception Net Xception Net

They usually need an input of images around 224x224x3 and I also saw 32x32x3.

Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.

Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.

  1. Should I scale my images? How about the opposite and shrink to 32x32 inputs?
  2. Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
  3. Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
  4. Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
  5. Do you have any recommendations regarding what architecture to choose?
  6. I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?

In advances I leave my thanks!


I am assuming you basic know-how in using CNN for classification

Answering question 1~3

You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.

You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.

The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.

Question 4

When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want. Also make sure the input size and format during inference is the same as the input size and format during training.

Question 5

If processing time is not a problem RESNET. If processing time is important, then MobileNet.

Question 6

6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.

I hope this will clear some of your problems and direct you to a proper solution.


In cs231n handout here, it says

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns... Hence, the best idea might be to train a linear classifier on the CNN codes.

I'm not sure what linear classifier means. Does the linear classifier refer to the last fully connected layer? (For example, in Alexnet, there are three fully connected layers. Does the linear classifier the last fully connected layer?)


Usually when people say "linear classifier" they refer to Linear SVM (support vector machine). A linear classifier learns a weight vecotr w and a threshold (aka "bias") b such that for each example x the sign of

<w, x> + b

is positive for the "positive" class and negative for the "negative" class.

The last (usually fully connected) layer of a neural-net can be considered as a form of a linear classifier.


I'm trying to use InceptionV4 for some classfication problem. Before using it on the problem I'm trying to experiment with it.

I replaced the last dense layer (sized 1001) with a new dense layer, compiled the model and tried to fit it

from keras import backend as K
import inception_v4
import numpy as np
import cv2
import os

from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers import Activation, Dropout, Flatten, Dense, Input

from keras.models import Model
os.environ['CUDA_VISIBLE_DEVICES'] = ''


train_data_dir ='//shared_directory/projects/try_CDFxx/data/train/'
validation_data_dir ='//shared_directory/projects/try_CDFxx/data/validation/'

img_width, img_height = 299, 299
nbr_train_samples = 24
nbr_validation_samples = 12

def train_top_model (num_classes):

    v4 = inception_v4.create_model(weights='imagenet')
    predictions = Dense(output_dim=num_classes, activation='softmax', name="newDense")(v4.layers[-2].output) # replacing the 1001 categories dense layer with my own 
    main_input= v4.layers[1].input
    t_model = Model(input=[main_input], output=[main_output])
    train_datagen = ImageDataGenerator(

    val_datagen = ImageDataGenerator(rescale=1./255)

    train_generator = train_datagen.flow_from_directory(
            target_size = (img_width, img_height),
            batch_size = my_batch_size,
            shuffle = True,
            class_mode = 'categorical')

    validation_generator = val_datagen.flow_from_directory(
            target_size=(img_width, img_height),
            shuffle = True,
            class_mode = 'categorical')

    t_model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

            samples_per_epoch = nbr_train_samples,
            nb_epoch = nb_epoch,
            validation_data = validation_generator,
            nb_val_samples = nbr_validation_samples)


But I am getting the following error

Traceback (most recent call last):
  File "", line 76, in <module>
  File "", line 72, in train_top_model
    nb_val_samples = nbr_validation_samples)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 1508, in fit_generator
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 1261, in train_on_batch
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 985, in _standardize_user_data
    exception_prefix='model target')
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 113, in standardize_input_data
ValueError: Error when checking model target: expected newDense to have shape (None, 1) but got array with shape (24, 3)
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/", line 801, in __bootstrap_inner
  File "/usr/lib/python2.7/", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/", line 409, in data_generator_task
    generator_output = next(generator)
  File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/", line 693, in next
    x = self.image_data_generator.random_transform(x)
  File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/", line 403, in random_transform
    fill_mode=self.fill_mode, cval=self.cval)
  File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/", line 109, in apply_transform
    final_offset, order=0, mode=fill_mode, cval=cval) for x_channel in x]
AttributeError: 'NoneType' object has no attribute 'interpolation'

What am I doing wrong? And why is the newDense layer expected to have a (None,1) shape after I defined it as having the size of 3?

Many thanks

PS I am adding the end of summary of the model

merge_25 (Merge)                 (None, 8, 8, 1536)    0           activation_140[0][0]
averagepooling2d_15 (AveragePool (None, 1, 1, 1536)    0           merge_25[0][0]
dropout_1 (Dropout)              (None, 1, 1, 1536)    0           averagepooling2d_15[0][0]
flatten_1 (Flatten)              (None, 1536)          0           dropout_1[0][0]
newDense (Dense)                 (None, 3)             4611        flatten_1[0][0]
Total params: 41,210,595
Trainable params: 41,147,427
Non-trainable params: 63,168


Ok, problem lied in

validation_generator = val_datagen.flow_from_directory(...
        class_mode = 'categorical')

Categorical makes your generator to return a one-hot encoded vector. In your case a 3-d one. But you set your loss to sparse_categorical_crossentropy which accepts int as a label. You should either change class_mode="sparse" or loss="categorical_crossentropy".


This tutorial has the tensor-flow implementation of batch normal layer for training and testing phases.

When we using transfer learning is it ok to use batch normalization layer? Specially when data distributions are different.

Because in the inference phase BN layer just uses fixed mini batch mean and variance(Which is calculated with the help of training distribution). So if our model has a different distribution of data , can it give wrong results?


With transfer learning, you're transferring the learned parameters from a domain to another. Usually, this means that you're keeping fixed the learned values of the convolutional layer whilst adding new fully connected layers that learn to classify the features extracted by the CNN.

When you add batch normalization to every layer, you're injecting values sampled from the input distribution into the layer, in order to force the output layer to be normally distributed. In order of doing that, you compute the exponential moving average of the layer output and then in the testing phase, you subtract this value from the layer output.

Although data dependent, this mean values (for every convolutional layer) are computed on the output of the layer, thus on the transformation learned.

Thus, in my opinion, the various averages that the BN layer subtracts from its convolutional layer output are general enough to be transferred: they are computed on the transformed data and not on the original data. Moreover, the convolutional layer learns to extract local patterns thus they're more robust and difficult to influence.

Thus, in short and in my opinion:

you can apply transfer learning of convolutional layer with batch norm applied. But on fully connected layers the influence of the computed value (that see the whole input and not only local patches) can bee too much data dependent and thus I'll avoid it.

However, as a rule of thumb: if you're insecure about something just try it and see if it works!


I'm using lastest Keras with tensorflow backend.

I'm not quite sure the correct way to put together the full model for inference, if I used a smaller version of my model for training on bottleneck values.

# Save  bottleneck values

from keras.applications.xception import Xception
base_model = Xception(weights='imagenet', include_top=False)
prediction =  base_model.predict(x)
** SAVE bottleneck data***

Now let's say my full model looks something like this:

base_model = Xception(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(classes, activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)

but to speed up training, I wanted to bypass the earlier layers by loading bottleneck values; so I create a smaller model (including only the new layers). I then train and save the model.

bottleneck_input = Input(shape = bottleneck_shape)
x = GlobalAveragePooling2D() (bottleneck_input)
x = Dense(1024, activation='relu')(x)
predictions = Dense(classes, activation='softmax')(x)
model = Model(input= bottleneck_input, output=predictions)
save_full_model() #save model

after training this smaller model, I want to run inference on the full model. So I need to put together the base model and the smaller model. Not sure what is the best way to to do this.

base_model = Xception(weights='imagenet', include_top=False)
#x = base_model.output

loaded_model = load_model() # load bottleneck model

#now to combine both models (something like this?)
Model(inputs = base_model.inputs, outputs = loaded_model.outputs)

What is the proper way to put together the model for inference? I don't know if there is a way to use my full-model for training, and just start from the bottleneck layers for training and input layer for inference. (Please not this is not the same as freeze layers, which just freezes the weights (weights won't be updated), but still calculates each data point.)


Every model is a layer with extra properties such as loss function etc. So you can use them like a layer in the functional API. In your case it could look like:

input = Input(...)
base_model = Xception(weights='imagenet', include_top=False)
# Apply model to input like layer
base_output = base_model(input)
loaded_model = load_model()
# Now the bottleneck model
out = loaded_model(base_output)
final_model = Model(input, out) # New computation graph


I'm training a VGG network using transfer learning approach. (fine-tuning) But while training the dataset, I found the following error where it stops the training process.

ETA: 19:00:06
  4407296/553467096 [..............................] - ETA: 19:06:49
  4415488/553467096 [..............................] - ETA: 19:10:23Traceback (most recent call last):
  File "C:\CT_SCAN_IMAGE_SET\vgg\", line 161, in <module>
    model = vgg16_model(img_rows, img_cols, channel, num_classes)
  File "C:\CT_SCAN_IMAGE_SET\vgg\", line 120, in vgg16_model
    model = VGG16(weights='imagenet', include_top=True)
  File "C:\Research\Python_installation\lib\site-packages\keras\applications\", line 169, in VGG16
  File "C:\Research\Python_installation\lib\site-packages\keras\utils\", line 221, in get_file
    urlretrieve(origin, fpath, dl_progress)
  File "C:\Research\Python_installation\lib\urllib\", line 217, in urlretrieve
    block =
  File "C:\Research\Python_installation\lib\http\", line 448, in read
    n = self.readinto(b)
  File "C:\Research\Python_installation\lib\http\", line 488, in readinto
    n = self.fp.readinto(b)
  File "C:\Research\Python_installation\lib\", line 575, in readinto
    return self._sock.recv_into(b)
  File "C:\Research\Python_installation\lib\", line 929, in recv_into
    return, buffer)
  File "C:\Research\Python_installation\lib\", line 791, in read
    return, buffer)
  File "C:\Research\Python_installation\lib\", line 575, in read
    v =, buffer)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Can someone please help me to identify the issue here.


When it's trying to download the weights of vgg16, the error is shown due to connection break. Try to download weights manually. (link:

Once the weights are downloaded put it in the folder called 'models' which is in keras folder. keras folder will be hidden by default.

PathToFolder: C:\Users\UserName.keras\models.

After this setup, try to run the code.


I am trying to do some transfer learning using this github DenseNet121 model ( I'm running into issues resizing the classification layer from 14 to 2 outputs.

Relevant part of the github code is:

class DenseNet121(nn.Module):
    """Model modified.
    The architecture of our model is the same as standard DenseNet121
    except the classifier layer which has an additional sigmoid function.
    def __init__(self, out_size):
        super(DenseNet121, self).__init__()
        self.densenet121 = torchvision.models.densenet121(pretrained=True)
        num_ftrs = self.densenet121.classifier.in_features
        self.densenet121.classifier = nn.Sequential(
            nn.Linear(num_ftrs, out_size),
def forward(self, x):
    x = self.densenet121(x)
    return x

I load and init with:

# initialize and load the model
model = DenseNet121(nnClassCount).cuda()
model = torch.nn.DataParallel(model).cuda()
modeldict = torch.load("model_ones_3epoch_densenet.tar")

It looks like DenseNet doesn't split layers up into children so model = nn.Sequential(*list(modelRes.children())[:-1]) won't work.

model.classifier = nn.Linear(1024, 2) seems to work on default DenseNets, but with the modified classifier (additional sigmoid function) here it ends up just adding an additional classifier layer without replacing the original.

I've tried

model.classifier = nn.Sequential(
    nn.Linear(1024, dset_classes_number), 

But am having the same added instead of replaced classifier issue:

      (classifier): Sequential(
        (0): Linear(in_features=1024, out_features=14, bias=True)
        (1): Sigmoid()
  (classifier): Sequential(
    (0): Linear(in_features=1024, out_features=2, bias=True)
    (1): Sigmoid()


If you want to replace the classifier inside densenet121 that is a member of your model you need to assign

model.densenet121.classifier = nn.Sequential(...)