Hot questions for Using Neural networks in face recognition


I am working in face recognition with deep neural network. I am using the CASIA-webface database of 10575 classes for training a deep CNN (used by CASIA, see the paper for details) of 10 Convolution, 5 Pooling and 1 Fully Connected layer. For the activation it uses "ReLU" function. I was able to successfully train it using caffe and obtained the desired performance.

My problem is that, I am unable to train/fine-tune the same CNN using "PReLU" activation. At first, I thought that a simple replace of "ReLU" with "PReLU" will do the job. However, none of fine-tuning (from caffemodel which was learned with "ReLU") and learn from scratch strategies worked.

In order to simplify the learning problem, I reduced the training dataset significantly only with 50 classes. However, yet the CNN was unable to learn with "PReLU", whereas it was able to learn with "ReLU".

In order to understand that my caffe works fine with "PReLU", I verified it by running simple networks (with both "ReLU" and "PReLU") using cifar10 data and it worked.

I would like to know from the community if anyone has similar observations. Or if anyone can provide any suggestion to overcome this problem.


The main difference between "ReLU" and "PReLU" activation is that the latter activation function has a non-zero slope for negative values of input, and that this slope can be learned from the data. It was observed that these properties make the training more robust to the random initialization of the weights. I used "PReLU" activation for fine-tuning nets that were trained originally with "ReLU"s and I experienced faster and more robust convergence.

My suggestion is to replace "ReLU" with the following configuration

layer {
  name: "prelu"
  type: "PReLU"
  bottom: "my_bottom"
  top: "my_bottom" # you can make it "in-place" to save memory
  param { lr_mult: 1 decay_mult: 0 }
  prelu_param { 
    filler: { type: "constant" val: 0 } 
    channel_shared: false

Note that by initializing the negative slope to 0, the "PReLU" activations are in-fact the same as "ReLU" so you start the fine tuning from exactly the same spot as your original net.

Also note that I explicitly set the learning rate and decay rate coefficients (1 and 0 resp.) -- you might need to tweak these params a bit, though I believe setting the decay_weight to any value other than zero is not wise.


i am in the process of learning neural networks using MATLAB, i'm trying implement a face recognition program using PCA for feature extraction, and a feedforward neural network for classification.

i have 3 people in my training set, the images are stored in 'data' directory.

i am using one network for each individual, and i train each network with all the images of my training set, the code for my project is presented below:

dirs = dir('data');
size = numel(dirs);
eigenVecs = [];
% a neural network for each individual
net1 = feedforwardnet(10);
net2 = feedforwardnet(10);
net3 = feedforwardnet(10);

% extract eigen vectors and prepare the input of the NN
for i= 3:size
    eigenVecs{i-2} =  eigenFaces(dirs(i).name);
trainSet= cell2mat(eigenVecs'); % 27X1024 double

% set the target for each NN, and then train it.
T = [1 1 1 1 1 1 1 1 1 ...
     0 0 0 0 0 0 0 0 0 ...
     0 0 0 0 0 0 0 0 0];
train(net1, trainSet', T);
T = [0 0 0 0 0 0 0 0 0 ...
     1 1 1 1 1 1 1 1 1 ...
     0 0 0 0 0 0 0 0 0];
train(net2, trainSet', T);
T = [0 0 0 0 0 0 0 0 0 ...
     0 0 0 0 0 0 0 0 0 ...
     1 1 1 1 1 1 1 1 1];
train(net3, trainSet', T);

after finishing with training the network, i get this panel:

nntraintool panel

** if anyone could explain to me the progress section of the panel, because i could not understand what those numbers meant. **

after training the networks, i try to test the network using the following:

sim(net1, L)

where L is a sample from my set which is a 1X1024 vector, the result i got is this :

Empty matrix: 0-by-1024

is my approach to training the neural networks wrong ? what can i do to fix this program ?

thank you


The code

 train(net1, trainSet', T);

does not save the trained network into the net1 variable (it saves it into ans variable). This is the reason why the result of sim is empty: there is no trained network in net1. You have to save the trained network yourself:

 net1= train(net1, trainSet', T);