Hot questions for Using Neural networks in transfer learning
I am looking for a way to re initialize layer's weights in an existing keras pre trained model.
I am using python with keras and need to use transfer learning, I use the following code to load the pre trained keras models
from keras.applications import vgg16, inception_v3, resnet50, mobilenet vgg_model = vgg16.VGG16(weights='imagenet')
I read that when using a dataset that is very different than the original dataset it might be beneficial to create new layers over the lower level features that we have in the trained net.
I found how to allow fine tuning of parameters and now I am looking for a way to reset a selected layer for it to re train. I know I can create a new model and use layer n-1 as input and add layer n to it, but I am looking for a way to reset the parameters in an existing layer in an existing model.
For whatever reason you may want to re-initalize the weights of a single layer
k, here is a general way to do it:
from keras.applications import vgg16 from keras import backend as K vgg_model = vgg16.VGG16(weights='imagenet') sess = K.get_session() initial_weights = vgg_model.get_weights() from keras.initializers import glorot_uniform # Or your initializer of choice k = 30 # say for layer 30 new_weights = [glorot_uniform()(initial_weights[i].shape).eval(session=sess) if i==k else initial_weights[i] for i in range(len(initial_weights))] vgg_model.set_weights(new_weights)
You can easily verify that
initial_weights[k]==new_weights[k] returns an array of
initial_weights[i]==new_weights[i] for any other
i returns an array of
I'm new in computer vision area and I hope you can help me with some fundamental questions regarding CNN architectures.
I know some of the most well-known ones are: VGG Net ResNet Dense Net Inception Net Xception Net
They usually need an input of images around 224x224x3 and I also saw 32x32x3.
Regarding my specific problem, my goal is to train biomedical images with size (80x80) for a 4-class classification - at the end I'll have a dense layer of 4. Also my dataset is quite small (1000images) and I wanted to use transfer learning.
Could you please help me with the following questions? It seems to me that there is no single correct answer to them, but I need to understand what should be the correct way of thinking about them. I will appreciate if you can give me some pointers as well.
- Should I scale my images? How about the opposite and shrink to 32x32 inputs?
- Should I change the input of the CNNs to 80x80? What parameters should I change mainly? Any specific ratio for the kernel and the parameters?
- Also I have another problem, the input requires 3 channels (RGB) but I'm working with grayscale images. Will it change the results a lot?
- Instead of scaling should I just fill the surroundings (between the 80x80 and 224x224) as background? Should the images be centered in this case?
- Do you have any recommendations regarding what architecture to choose?
- I've seen some adaptations of these architectures to 3D/volumes inputs instead of 2D/images. I have a similar problem to the one I described here but with 3D inputs. Is there any common reasoning when choosing a 3D CNN architecture instead of a 2D?
In advances I leave my thanks!
I am assuming you basic know-how in using CNN for classification
Answering question 1~3
You scale your image for several purposes. Smaller the image, the faster the training and inference time. However you will lose important information in the process of shrinking the image. There is no one right answer and it all depends on your application. Is real-time process important? If your answer is no, always stick to the original size.
You will also need to resize your image to fit the input size of predefined models if you plan to retrain them. However, since your image is in grayscale, you will need to find models trained in gray or create a 3 channel image and copy the same value to all R,G and B channel. This is not efficient but it will help you reuse the high quality model trained by others.
The best way i see for you to handle this problem is to train everything from start. 1000 can seem to be a small number of data, but since your domain is specific and only require 4 classes, training from scratch doesnt seem that bad.
When the size is different, always scale. filling with the surrounding will cause the model to learn the empty spaces and that is not what we want. Also make sure the input size and format during inference is the same as the input size and format during training.
If processing time is not a problem RESNET. If processing time is important, then MobileNet.
6) Depends on your input data. If you have 3D data then you can use it. More input data usually helps in better classification. But 2D will be enough to solve certain problem. If you can classify the images by looking at the 2D images, most probabily 2D images will be enough to complete the task.
I hope this will clear some of your problems and direct you to a proper solution.
In cs231n handout here, it says
New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns... Hence, the best idea might be to train a linear classifier on the CNN codes.
I'm not sure what linear classifier means. Does the linear classifier refer to the last fully connected layer? (For example, in Alexnet, there are three fully connected layers. Does the linear classifier the last fully connected layer?)
Usually when people say "linear classifier" they refer to Linear SVM (support vector machine). A linear classifier learns a weight vecotr
w and a threshold (aka "bias")
b such that for each example
x the sign of
<w, x> + b
is positive for the "positive" class and negative for the "negative" class.
The last (usually fully connected) layer of a neural-net can be considered as a form of a linear classifier.
I'm trying to use
InceptionV4 for some classfication problem. Before using it on the problem I'm trying to experiment with it.
I replaced the last dense layer (sized
1001) with a new dense layer, compiled the model and tried to fit it
from keras import backend as K import inception_v4 import numpy as np import cv2 import os from keras import optimizers from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D from keras.layers import Activation, Dropout, Flatten, Dense, Input from keras.models import Model os.environ['CUDA_VISIBLE_DEVICES'] = '' my_batch_size=32 train_data_dir ='//shared_directory/projects/try_CDFxx/data/train/' validation_data_dir ='//shared_directory/projects/try_CDFxx/data/validation/' img_width, img_height = 299, 299 num_classes=3 nb_epoch=50 nbr_train_samples = 24 nbr_validation_samples = 12 def train_top_model (num_classes): v4 = inception_v4.create_model(weights='imagenet') predictions = Dense(output_dim=num_classes, activation='softmax', name="newDense")(v4.layers[-2].output) # replacing the 1001 categories dense layer with my own main_input= v4.layers.input main_output=predictions t_model = Model(input=[main_input], output=[main_output]) train_datagen = ImageDataGenerator( rescale=1./255, shear_range=0.1, zoom_range=0.1, rotation_range=10., width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True) val_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( train_data_dir, target_size = (img_width, img_height), batch_size = my_batch_size, shuffle = True, class_mode = 'categorical') validation_generator = val_datagen.flow_from_directory( validation_data_dir, target_size=(img_width, img_height), batch_size=my_batch_size, shuffle = True, class_mode = 'categorical') # t_model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # t_model.fit_generator( train_generator, samples_per_epoch = nbr_train_samples, nb_epoch = nb_epoch, validation_data = validation_generator, nb_val_samples = nbr_validation_samples) train_top_model(num_classes)
But I am getting the following error
Traceback (most recent call last): File "re_try.py", line 76, in <module> train_top_model(num_classes) File "re_try.py", line 72, in train_top_model nb_val_samples = nbr_validation_samples) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1508, in fit_generator class_weight=class_weight) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1261, in train_on_batch check_batch_dim=True) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 985, in _standardize_user_data exception_prefix='model target') File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 113, in standardize_input_data str(array.shape)) ValueError: Error when checking model target: expected newDense to have shape (None, 1) but got array with shape (24, 3) Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 409, in data_generator_task generator_output = next(generator) File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 693, in next x = self.image_data_generator.random_transform(x) File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 403, in random_transform fill_mode=self.fill_mode, cval=self.cval) File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 109, in apply_transform final_offset, order=0, mode=fill_mode, cval=cval) for x_channel in x] AttributeError: 'NoneType' object has no attribute 'interpolation'
What am I doing wrong?
And why is the newDense layer expected to have a
(None,1) shape after I defined it as having the size of 3?
PS I am adding the end of summary of the model
merge_25 (Merge) (None, 8, 8, 1536) 0 activation_140 merge_23 merge_24 activation_149 ____________________________________________________________________________________________________ averagepooling2d_15 (AveragePool (None, 1, 1, 1536) 0 merge_25 ____________________________________________________________________________________________________ dropout_1 (Dropout) (None, 1, 1, 1536) 0 averagepooling2d_15 ____________________________________________________________________________________________________ flatten_1 (Flatten) (None, 1536) 0 dropout_1 ____________________________________________________________________________________________________ newDense (Dense) (None, 3) 4611 flatten_1 ==================================================================================================== Total params: 41,210,595 Trainable params: 41,147,427 Non-trainable params: 63,168
Ok, problem lied in
validation_generator = val_datagen.flow_from_directory(... class_mode = 'categorical')
Categorical makes your generator to return a one-hot encoded vector. In your case a
3-d one. But you set your
sparse_categorical_crossentropy which accepts
int as a label. You should either change
Can we use batch normalization with transfer learning for an instance with different data distribution?
This tutorial has the tensor-flow implementation of batch normal layer for training and testing phases.
When we using transfer learning is it ok to use batch normalization layer? Specially when data distributions are different.
Because in the inference phase BN layer just uses fixed mini batch mean and variance(Which is calculated with the help of training distribution). So if our model has a different distribution of data , can it give wrong results?
With transfer learning, you're transferring the learned parameters from a domain to another. Usually, this means that you're keeping fixed the learned values of the convolutional layer whilst adding new fully connected layers that learn to classify the features extracted by the CNN.
When you add batch normalization to every layer, you're injecting values sampled from the input distribution into the layer, in order to force the output layer to be normally distributed. In order of doing that, you compute the exponential moving average of the layer output and then in the testing phase, you subtract this value from the layer output.
Although data dependent, this mean values (for every convolutional layer) are computed on the output of the layer, thus on the transformation learned.
Thus, in my opinion, the various averages that the BN layer subtracts from its convolutional layer output are general enough to be transferred: they are computed on the transformed data and not on the original data. Moreover, the convolutional layer learns to extract local patterns thus they're more robust and difficult to influence.
Thus, in short and in my opinion:
you can apply transfer learning of convolutional layer with batch norm applied. But on fully connected layers the influence of the computed value (that see the whole input and not only local patches) can bee too much data dependent and thus I'll avoid it.
However, as a rule of thumb: if you're insecure about something just try it and see if it works!
I'm using lastest Keras with tensorflow backend.
I'm not quite sure the correct way to put together the full model for inference, if I used a smaller version of my model for training on bottleneck values.
# Save bottleneck values from keras.applications.xception import Xception base_model = Xception(weights='imagenet', include_top=False) prediction = base_model.predict(x) ** SAVE bottleneck data***
Now let's say my full model looks something like this:
base_model = Xception(weights='imagenet', include_top=False) x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(1024, activation='relu')(x) predictions = Dense(classes, activation='softmax')(x) model = Model(input=base_model.input, output=predictions)
but to speed up training, I wanted to bypass the earlier layers by loading bottleneck values; so I create a smaller model (including only the new layers). I then train and save the model.
bottleneck_input = Input(shape = bottleneck_shape) x = GlobalAveragePooling2D() (bottleneck_input) x = Dense(1024, activation='relu')(x) predictions = Dense(classes, activation='softmax')(x) model = Model(input= bottleneck_input, output=predictions) save_full_model() #save model
after training this smaller model, I want to run inference on the full model. So I need to put together the base model and the smaller model. Not sure what is the best way to to do this.
base_model = Xception(weights='imagenet', include_top=False) #x = base_model.output loaded_model = load_model() # load bottleneck model #now to combine both models (something like this?) Model(inputs = base_model.inputs, outputs = loaded_model.outputs)
What is the proper way to put together the model for inference? I don't know if there is a way to use my full-model for training, and just start from the bottleneck layers for training and input layer for inference. (Please not this is not the same as freeze layers, which just freezes the weights (weights won't be updated), but still calculates each data point.)
Every model is a layer with extra properties such as loss function etc. So you can use them like a layer in the functional API. In your case it could look like:
input = Input(...) base_model = Xception(weights='imagenet', include_top=False) # Apply model to input like layer base_output = base_model(input) loaded_model = load_model() # Now the bottleneck model out = loaded_model(base_output) final_model = Model(input, out) # New computation graph
I'm training a VGG network using transfer learning approach. (fine-tuning) But while training the dataset, I found the following error where it stops the training process.
ETA: 19:00:06 4407296/553467096 [..............................] - ETA: 19:06:49 4415488/553467096 [..............................] - ETA: 19:10:23Traceback (most recent call last): File "C:\CT_SCAN_IMAGE_SET\vgg\vggTransferLearning.py", line 161, in <module> model = vgg16_model(img_rows, img_cols, channel, num_classes) File "C:\CT_SCAN_IMAGE_SET\vgg\vggTransferLearning.py", line 120, in vgg16_model model = VGG16(weights='imagenet', include_top=True) File "C:\Research\Python_installation\lib\site-packages\keras\applications\vgg16.py", line 169, in VGG16 file_hash='64373286793e3c8b2b4e3219cbf3544b') File "C:\Research\Python_installation\lib\site-packages\keras\utils\data_utils.py", line 221, in get_file urlretrieve(origin, fpath, dl_progress) File "C:\Research\Python_installation\lib\urllib\request.py", line 217, in urlretrieve block = fp.read(bs) File "C:\Research\Python_installation\lib\http\client.py", line 448, in read n = self.readinto(b) File "C:\Research\Python_installation\lib\http\client.py", line 488, in readinto n = self.fp.readinto(b) File "C:\Research\Python_installation\lib\socket.py", line 575, in readinto return self._sock.recv_into(b) File "C:\Research\Python_installation\lib\ssl.py", line 929, in recv_into return self.read(nbytes, buffer) File "C:\Research\Python_installation\lib\ssl.py", line 791, in read return self._sslobj.read(len, buffer) File "C:\Research\Python_installation\lib\ssl.py", line 575, in read v = self._sslobj.read(len, buffer) ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Can someone please help me to identify the issue here.
When it's trying to download the weights of vgg16, the error is shown due to connection break. Try to download weights manually. (link: https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5)
Once the weights are downloaded put it in the folder called 'models' which is in keras folder. keras folder will be hidden by default.
After this setup, try to run the code.
I am trying to do some transfer learning using this github DenseNet121 model (https://github.com/gaetandi/cheXpert.git). I'm running into issues resizing the classification layer from 14 to 2 outputs.
Relevant part of the github code is:
class DenseNet121(nn.Module): """Model modified. The architecture of our model is the same as standard DenseNet121 except the classifier layer which has an additional sigmoid function. """ def __init__(self, out_size): super(DenseNet121, self).__init__() self.densenet121 = torchvision.models.densenet121(pretrained=True) num_ftrs = self.densenet121.classifier.in_features self.densenet121.classifier = nn.Sequential( nn.Linear(num_ftrs, out_size), nn.Sigmoid() ) def forward(self, x): x = self.densenet121(x) return x
I load and init with:
# initialize and load the model model = DenseNet121(nnClassCount).cuda() model = torch.nn.DataParallel(model).cuda() modeldict = torch.load("model_ones_3epoch_densenet.tar") model.load_state_dict(modeldict['state_dict'])
It looks like DenseNet doesn't split layers up into children so
model = nn.Sequential(*list(modelRes.children())[:-1]) won't work.
model.classifier = nn.Linear(1024, 2) seems to work on default DenseNets, but with the modified classifier (additional sigmoid function) here it ends up just adding an additional classifier layer without replacing the original.
model.classifier = nn.Sequential( nn.Linear(1024, dset_classes_number), nn.Sigmoid() )
But am having the same added instead of replaced classifier issue:
... ) (classifier): Sequential( (0): Linear(in_features=1024, out_features=14, bias=True) (1): Sigmoid() ) ) ) (classifier): Sequential( (0): Linear(in_features=1024, out_features=2, bias=True) (1): Sigmoid() ) )
If you want to replace the
densenet121 that is a member of your
model you need to assign
model.densenet121.classifier = nn.Sequential(...)