Hot questions for Using Neural networks in ubuntu

Question:

I would like to compile / configure Caffe so that when I trained an artificial neural network with it, the training is multi-threaded (CPU only, no GPU). How to enable multithreading with Caffe? I use Caffe on Ubuntu 14.04 LTS x64.


Answer:

One way is to use OpenBLAS instead of the default ATLAS. To do so,

  1. sudo apt-get install -y libopenblas-dev
  2. Before compiling Caffe, edit Makefile.config, replace BLAS := atlas by BLAS := open
  3. After compiling Caffe, running export OPENBLAS_NUM_THREADS=4 will cause Caffe to use 4 cores.

If interested, here is a script to install Caffe and pycaffe on a new Ubuntu 14.04 LTS x64 or Ubuntu 14.10 x64. CPU only, multi-threaded Caffe. It can probably be improved, but it's good enough for me for now:

# This script installs Caffe and pycaffe on Ubuntu 14.04 x64 or 14.10 x64. CPU only, multi-threaded Caffe.
# Usage: 
# 0. Set up here how many cores you want to use during the installation:
# By default Caffe will use all these cores.
NUMBER_OF_CORES=4
# 1. Execute this script, e.g. "bash compile_caffe_ubuntu_14.04.sh" (~30 to 60 minutes on a new Ubuntu).
# 2. Open a new shell (or run "source ~/.bash_profile"). You're done. You can try 
#    running "import caffe" from the Python interpreter to test.

#http://caffe.berkeleyvision.org/install_apt.html : (general install info: http://caffe.berkeleyvision.org/installation.html)
cd
sudo apt-get update
#sudo apt-get upgrade -y # If you are OK getting prompted
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" # If you are OK with all defaults

sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev
sudo apt-get install -y --no-install-recommends libboost-all-dev
sudo apt-get install -y libatlas-base-dev 
sudo apt-get install -y python-dev 
sudo apt-get install -y python-pip git

# For Ubuntu 14.04
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler 

# LMDB
# https://github.com/BVLC/caffe/issues/2729: Temporarily broken link to the LMDB repository #2729
#git clone https://gitorious.org/mdb/mdb.git
#cd mdb/libraries/liblmdb
#make && make install 

git clone https://github.com/LMDB/lmdb.git 
cd lmdb/libraries/liblmdb
sudo make 
sudo make install

# More pre-requisites 
sudo apt-get install -y cmake unzip doxygen
sudo apt-get install -y protobuf-compiler
sudo apt-get install -y libffi-dev python-dev build-essential
sudo pip install lmdb
sudo pip install numpy
sudo apt-get install -y python-numpy
sudo apt-get install -y gfortran # required by scipy
sudo pip install scipy # required by scikit-image
sudo apt-get install -y python-scipy # in case pip failed
sudo apt-get install -y python-nose
sudo pip install scikit-image # to fix https://github.com/BVLC/caffe/issues/50


# Get caffe (http://caffe.berkeleyvision.org/installation.html#compilation)
cd
mkdir caffe
cd caffe
wget https://github.com/BVLC/caffe/archive/master.zip
unzip -o master.zip
cd caffe-master

# Prepare Python binding (pycaffe)
cd python
for req in $(cat requirements.txt); do sudo pip install $req; done
echo "export PYTHONPATH=$(pwd):$PYTHONPATH " >> ~/.bash_profile # to be able to call "import caffe" from Python after reboot
source ~/.bash_profile # Update shell 
cd ..

# Compile caffe and pycaffe
cp Makefile.config.example Makefile.config
sed -i '8s/.*/CPU_ONLY := 1/' Makefile.config # Line 8: CPU only
sudo apt-get install -y libopenblas-dev
sed -i '33s/.*/BLAS := open/' Makefile.config # Line 33: to use OpenBLAS
# Note that if one day the Makefile.config changes and these line numbers change, we're screwed
# Maybe it would be best to simply append those changes at the end of Makefile.config 
echo "export OPENBLAS_NUM_THREADS=($NUMBER_OF_CORES)" >> ~/.bash_profile 
mkdir build
cd build
cmake ..
cd ..
make all -j$NUMBER_OF_CORES # 4 is the number of parallel threads for compilation: typically equal to number of physical cores
make pycaffe -j$NUMBER_OF_CORES
make test
make runtest
#make matcaffe
make distribute

# Bonus for other work with pycaffe
sudo pip install pydot
sudo apt-get install -y graphviz
sudo pip install scikit-learn

# At the end, you need to run "source ~/.bash_profile" manually or start a new shell to be able to do 'python import caffe', 
# because one cannot source in a bash script. (http://stackoverflow.com/questions/16011245/source-files-in-a-bash-script)

I have placed this script on GitHub: https://github.com/Franck-Dernoncourt/caffe_demos/tree/master/caffe_installation .

Question:

I want to redirect Caffe's output from the terminal to a file (say output.txt). I'm using the command

caffe train -solver=expt/solver.prototxt > output.txt` 

However, the > operator doesn't seem to be working and Caffe spits out all the output on to the terminal. I'm using Ubuntu 14.04.

Can't seem to figure out why > is not working with Caffe. Any help is much appreciated. Thank you.


Answer:

You need to redierect stderr as well

caffe train ... > output.txt 2>&1

The redirection operator > redirects only stdout, caffe is using sterr as well. You might want to set GLOG_logtosterr=1 as well.

Question:

I am using GoogleNet model for binary classification of images. Earlier, I was using the virtual machine and now I am using Ubuntu 14.04. Both are giving me different results. I tried to find out a lot where is the problem but couldn't pinpoint it.

I have trained two models separately one in Ubuntu 14.04 and another in the virtual machine. Both models are using CPU. cuDNN is not being used in both. Regarding BLAS library I am using default ATLAS.

Any suggestions would be of great help.


Answer:

Since you started your training from scratch in both cases and you did not explicitly fixed random_seed parameter in your solver.prototxt it is very likely that caffe initialized your model with different random weights for each of the two training processes. Starting from different points is very likely to end with differently trained models. If you are concerned about possible differences in caffe between the two architectures, try repeat the training but with the same random_seed parameter in solver.prototxt.

Question:

Hi I am new one to ubuntu and caffe. Now I am studying using caffe for image classification by the following link instruction (http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/).

Could you tell guys tell me the meaning the ""-backend=lmdb" in the following command:

/home/ubuntu/caffe/build/tools/compute_image_mean -backend=lmdb /home/ubuntu/deeplearning-cats-dogs-tutorial/input/train_lmdb /home/ubuntu/deeplearning-cats-dogs-tutorial/input/mean.binaryproto

Answer:

This command runs a tool that computes mean pixel values to be used for image preprocessing. Mean values are computed using the entire training set. Caffe often uses datasets of format lmdb or leveldb the command switch --backend simply tells caffe to look for lmdb format.

Question:

I built a caffe network + solver (for binary classification) and when I run the code (and try to train the network), I see this error:

I0914 20:03:01.362612  4024 solver.cpp:280] Learning Rate Policy: step
I0914 20:03:01.367985  4024 solver.cpp:337] Iteration 0, Testing net (#0)
I0914 20:03:01.368085  4024 net.cpp:693] Ignoring source layer train_database
I0914 20:03:04.568979  4024 solver.cpp:404]     Test net output #0: accuracy = 0.07575
I0914 20:03:04.569093  4024 solver.cpp:404]     Test net output #1: loss = 2.20947 (* 1 = 2.20947 loss)
I0914 20:03:04.610549  4024 solver.cpp:228] Iteration 0, loss = 2.31814
I0914 20:03:04.610666  4024 solver.cpp:244]     Train net output #0: loss = 2.31814 (* 1 = 2.31814 loss)
*** Aborted at 1473872584 (unix time) try "date -d @1473872584" if you are using GNU date ***
PC: @     0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
*** SIGFPE (@0x7f6870b62c52) received by PID 4024 (TID 0x7f6871004a40) from PID 1890987090; stack trace: ***
    @     0x7f686f6bbcb0 (unknown)
    @     0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
    @     0x7f6870b62e44 caffe::SGDSolver<>::ApplyUpdate()
    @     0x7f6870b8e2fc caffe::Solver<>::Step()
    @     0x7f6870b8eb09 caffe::Solver<>::Solve()
    @           0x40821d train()
    @           0x40589c main
    @     0x7f686f6a6f45 (unknown)
    @           0x40610b (unknown)
    @                0x0 (unknown)
Floating point exception (core dumped)

I searched a lot, and the main solutions that I've found is to: 1. recompile the caffe files. tried make clean -> make all -> make test -> make runtest 2. change the driver that the linux uses. I used the red and changed to the green (note: I'm using CPU with my caffe, and it's mentioned in the makeconfig file):

All of this didn't help, and I still can't run my network.

Does anyone have an idea? thanks a lot, anyway :)

this is the full log:

/home/roishik/anaconda2/bin/python /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/code/run_network.py
I0914 20:03:01.142490  4024 caffe.cpp:210] Use CPU.
I0914 20:03:01.142940  4024 solver.cpp:48] Initializing solver from parameters: 
test_iter: 400
test_interval: 400
base_lr: 0.001
display: 50
max_iter: 40000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/snapshots"
solver_mode: CPU
net: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt"
train_state {
  level: 0
  stage: ""
}
I0914 20:03:01.143082  4024 solver.cpp:91] Creating training net from net file: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt
I0914 20:03:01.143712  4024 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer validation_database
I0914 20:03:01.143754  4024 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0914 20:03:01.143913  4024 net.cpp:58] Initializing net from parameters: 
name: "fc2Net"
state {
  phase: TRAIN
  level: 0
  stage: ""
}
layer {
  name: "train_database"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/mean.binaryproto"
  }
  data_param {
    source: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/train_lmdb"
    batch_size: 200
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1024
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1024
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "fc2"
  top: "fc2"
}
layer {
  name: "fc3"
  type: "InnerProduct"
  bottom: "fc2"
  top: "fc3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc3"
  bottom: "label"
  top: "loss"
}
I0914 20:03:01.144016  4024 layer_factory.hpp:77] Creating layer train_database
I0914 20:03:01.144811  4024 net.cpp:100] Creating Layer train_database
I0914 20:03:01.144846  4024 net.cpp:408] train_database -> data
I0914 20:03:01.144909  4024 net.cpp:408] train_database -> label
I0914 20:03:01.144951  4024 data_transformer.cpp:25] Loading mean file from: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/mean.binaryproto
I0914 20:03:01.153393  4035 db_lmdb.cpp:35] Opened lmdb /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/train_lmdb
I0914 20:03:01.153481  4024 data_layer.cpp:41] output data size: 200,1,32,32
I0914 20:03:01.154615  4024 net.cpp:150] Setting up train_database
I0914 20:03:01.154670  4024 net.cpp:157] Top shape: 200 1 32 32 (204800)
I0914 20:03:01.154693  4024 net.cpp:157] Top shape: 200 (200)
I0914 20:03:01.154712  4024 net.cpp:165] Memory required for data: 820000
I0914 20:03:01.154742  4024 layer_factory.hpp:77] Creating layer fc1
I0914 20:03:01.154781  4024 net.cpp:100] Creating Layer fc1
I0914 20:03:01.154804  4024 net.cpp:434] fc1 <- data
I0914 20:03:01.154837  4024 net.cpp:408] fc1 -> fc1
I0914 20:03:01.159675  4036 blocking_queue.cpp:50] Waiting for data
I0914 20:03:01.215118  4024 net.cpp:150] Setting up fc1
I0914 20:03:01.215214  4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.215237  4024 net.cpp:165] Memory required for data: 1639200
I0914 20:03:01.215306  4024 layer_factory.hpp:77] Creating layer relu1
I0914 20:03:01.215342  4024 net.cpp:100] Creating Layer relu1
I0914 20:03:01.215363  4024 net.cpp:434] relu1 <- fc1
I0914 20:03:01.215387  4024 net.cpp:395] relu1 -> fc1 (in-place)
I0914 20:03:01.215417  4024 net.cpp:150] Setting up relu1
I0914 20:03:01.215440  4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.215459  4024 net.cpp:165] Memory required for data: 2458400
I0914 20:03:01.215478  4024 layer_factory.hpp:77] Creating layer fc2
I0914 20:03:01.215504  4024 net.cpp:100] Creating Layer fc2
I0914 20:03:01.215524  4024 net.cpp:434] fc2 <- fc1
I0914 20:03:01.215549  4024 net.cpp:408] fc2 -> fc2
I0914 20:03:01.264021  4024 net.cpp:150] Setting up fc2
I0914 20:03:01.264062  4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.264072  4024 net.cpp:165] Memory required for data: 3277600
I0914 20:03:01.264097  4024 layer_factory.hpp:77] Creating layer relu2
I0914 20:03:01.264118  4024 net.cpp:100] Creating Layer relu2
I0914 20:03:01.264129  4024 net.cpp:434] relu2 <- fc2
I0914 20:03:01.264143  4024 net.cpp:395] relu2 -> fc2 (in-place)
I0914 20:03:01.264166  4024 net.cpp:150] Setting up relu2
I0914 20:03:01.264181  4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.264190  4024 net.cpp:165] Memory required for data: 4096800
I0914 20:03:01.264201  4024 layer_factory.hpp:77] Creating layer fc3
I0914 20:03:01.264219  4024 net.cpp:100] Creating Layer fc3
I0914 20:03:01.264230  4024 net.cpp:434] fc3 <- fc2
I0914 20:03:01.264245  4024 net.cpp:408] fc3 -> fc3
I0914 20:03:01.264389  4024 net.cpp:150] Setting up fc3
I0914 20:03:01.264407  4024 net.cpp:157] Top shape: 200 2 (400)
I0914 20:03:01.264416  4024 net.cpp:165] Memory required for data: 4098400
I0914 20:03:01.264434  4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.264447  4024 net.cpp:100] Creating Layer loss
I0914 20:03:01.264459  4024 net.cpp:434] loss <- fc3
I0914 20:03:01.264469  4024 net.cpp:434] loss <- label
I0914 20:03:01.264487  4024 net.cpp:408] loss -> loss
I0914 20:03:01.264513  4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.264544  4024 net.cpp:150] Setting up loss
I0914 20:03:01.264559  4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.264569  4024 net.cpp:160]     with loss weight 1
I0914 20:03:01.264595  4024 net.cpp:165] Memory required for data: 4098404
I0914 20:03:01.264606  4024 net.cpp:226] loss needs backward computation.
I0914 20:03:01.264617  4024 net.cpp:226] fc3 needs backward computation.
I0914 20:03:01.264626  4024 net.cpp:226] relu2 needs backward computation.
I0914 20:03:01.264636  4024 net.cpp:226] fc2 needs backward computation.
I0914 20:03:01.264647  4024 net.cpp:226] relu1 needs backward computation.
I0914 20:03:01.264655  4024 net.cpp:226] fc1 needs backward computation.
I0914 20:03:01.264667  4024 net.cpp:228] train_database does not need backward computation.
I0914 20:03:01.264675  4024 net.cpp:270] This network produces output loss
I0914 20:03:01.264695  4024 net.cpp:283] Network initialization done.
I0914 20:03:01.265384  4024 solver.cpp:181] Creating test net (#0) specified by net file: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt
I0914 20:03:01.265435  4024 net.cpp:322] The NetState phase (1) differed from the phase (0) specified by a rule in layer train_database
I0914 20:03:01.265606  4024 net.cpp:58] Initializing net from parameters: 
name: "fc2Net"
state {
  phase: TEST
}
layer {
  name: "validation_database"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/mean.binaryproto"
  }
  data_param {
    source: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/validation_lmdb"
    batch_size: 40
    backend: LMDB
  }
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "data"
  top: "fc1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1024
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1024
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "fc2"
  top: "fc2"
}
layer {
  name: "fc3"
  type: "InnerProduct"
  bottom: "fc2"
  top: "fc3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc3"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc3"
  bottom: "label"
  top: "loss"
}
I0914 20:03:01.265750  4024 layer_factory.hpp:77] Creating layer validation_database
I0914 20:03:01.265878  4024 net.cpp:100] Creating Layer validation_database
I0914 20:03:01.265897  4024 net.cpp:408] validation_database -> data
I0914 20:03:01.265918  4024 net.cpp:408] validation_database -> label
I0914 20:03:01.265936  4024 data_transformer.cpp:25] Loading mean file from: /home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/mean.binaryproto
I0914 20:03:01.266034  4037 db_lmdb.cpp:35] Opened lmdb /home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/validation_lmdb
I0914 20:03:01.266098  4024 data_layer.cpp:41] output data size: 40,1,32,32
I0914 20:03:01.266295  4024 net.cpp:150] Setting up validation_database
I0914 20:03:01.266315  4024 net.cpp:157] Top shape: 40 1 32 32 (40960)
I0914 20:03:01.266330  4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266340  4024 net.cpp:165] Memory required for data: 164000
I0914 20:03:01.266350  4024 layer_factory.hpp:77] Creating layer label_validation_database_1_split
I0914 20:03:01.266386  4024 net.cpp:100] Creating Layer label_validation_database_1_split
I0914 20:03:01.266404  4024 net.cpp:434] label_validation_database_1_split <- label
I0914 20:03:01.266422  4024 net.cpp:408] label_validation_database_1_split -> label_validation_database_1_split_0
I0914 20:03:01.266443  4024 net.cpp:408] label_validation_database_1_split -> label_validation_database_1_split_1
I0914 20:03:01.266464  4024 net.cpp:150] Setting up label_validation_database_1_split
I0914 20:03:01.266480  4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266494  4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266505  4024 net.cpp:165] Memory required for data: 164320
I0914 20:03:01.266515  4024 layer_factory.hpp:77] Creating layer fc1
I0914 20:03:01.266531  4024 net.cpp:100] Creating Layer fc1
I0914 20:03:01.266543  4024 net.cpp:434] fc1 <- data
I0914 20:03:01.266558  4024 net.cpp:408] fc1 -> fc1
I0914 20:03:01.320364  4024 net.cpp:150] Setting up fc1
I0914 20:03:01.320461  4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.320489  4024 net.cpp:165] Memory required for data: 328160
I0914 20:03:01.320533  4024 layer_factory.hpp:77] Creating layer relu1
I0914 20:03:01.320571  4024 net.cpp:100] Creating Layer relu1
I0914 20:03:01.320597  4024 net.cpp:434] relu1 <- fc1
I0914 20:03:01.320627  4024 net.cpp:395] relu1 -> fc1 (in-place)
I0914 20:03:01.320652  4024 net.cpp:150] Setting up relu1
I0914 20:03:01.320667  4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.320678  4024 net.cpp:165] Memory required for data: 492000
I0914 20:03:01.320689  4024 layer_factory.hpp:77] Creating layer fc2
I0914 20:03:01.320709  4024 net.cpp:100] Creating Layer fc2
I0914 20:03:01.320719  4024 net.cpp:434] fc2 <- fc1
I0914 20:03:01.320734  4024 net.cpp:408] fc2 -> fc2
I0914 20:03:01.361732  4024 net.cpp:150] Setting up fc2
I0914 20:03:01.361766  4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.361802  4024 net.cpp:165] Memory required for data: 655840
I0914 20:03:01.361821  4024 layer_factory.hpp:77] Creating layer relu2
I0914 20:03:01.361837  4024 net.cpp:100] Creating Layer relu2
I0914 20:03:01.361845  4024 net.cpp:434] relu2 <- fc2
I0914 20:03:01.361852  4024 net.cpp:395] relu2 -> fc2 (in-place)
I0914 20:03:01.361866  4024 net.cpp:150] Setting up relu2
I0914 20:03:01.361872  4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.361877  4024 net.cpp:165] Memory required for data: 819680
I0914 20:03:01.361881  4024 layer_factory.hpp:77] Creating layer fc3
I0914 20:03:01.361892  4024 net.cpp:100] Creating Layer fc3
I0914 20:03:01.361901  4024 net.cpp:434] fc3 <- fc2
I0914 20:03:01.361909  4024 net.cpp:408] fc3 -> fc3
I0914 20:03:01.362009  4024 net.cpp:150] Setting up fc3
I0914 20:03:01.362017  4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362022  4024 net.cpp:165] Memory required for data: 820000
I0914 20:03:01.362032  4024 layer_factory.hpp:77] Creating layer fc3_fc3_0_split
I0914 20:03:01.362041  4024 net.cpp:100] Creating Layer fc3_fc3_0_split
I0914 20:03:01.362046  4024 net.cpp:434] fc3_fc3_0_split <- fc3
I0914 20:03:01.362053  4024 net.cpp:408] fc3_fc3_0_split -> fc3_fc3_0_split_0
I0914 20:03:01.362062  4024 net.cpp:408] fc3_fc3_0_split -> fc3_fc3_0_split_1
I0914 20:03:01.362073  4024 net.cpp:150] Setting up fc3_fc3_0_split
I0914 20:03:01.362082  4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362088  4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362093  4024 net.cpp:165] Memory required for data: 820640
I0914 20:03:01.362097  4024 layer_factory.hpp:77] Creating layer accuracy
I0914 20:03:01.362120  4024 net.cpp:100] Creating Layer accuracy
I0914 20:03:01.362128  4024 net.cpp:434] accuracy <- fc3_fc3_0_split_0
I0914 20:03:01.362134  4024 net.cpp:434] accuracy <- label_validation_database_1_split_0
I0914 20:03:01.362141  4024 net.cpp:408] accuracy -> accuracy
I0914 20:03:01.362152  4024 net.cpp:150] Setting up accuracy
I0914 20:03:01.362159  4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.362164  4024 net.cpp:165] Memory required for data: 820644
I0914 20:03:01.362169  4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.362176  4024 net.cpp:100] Creating Layer loss
I0914 20:03:01.362181  4024 net.cpp:434] loss <- fc3_fc3_0_split_1
I0914 20:03:01.362187  4024 net.cpp:434] loss <- label_validation_database_1_split_1
I0914 20:03:01.362193  4024 net.cpp:408] loss -> loss
I0914 20:03:01.362226  4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.362251  4024 net.cpp:150] Setting up loss
I0914 20:03:01.362265  4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.362277  4024 net.cpp:160]     with loss weight 1
I0914 20:03:01.362298  4024 net.cpp:165] Memory required for data: 820648
I0914 20:03:01.362311  4024 net.cpp:226] loss needs backward computation.
I0914 20:03:01.362323  4024 net.cpp:228] accuracy does not need backward computation.
I0914 20:03:01.362336  4024 net.cpp:226] fc3_fc3_0_split needs backward computation.
I0914 20:03:01.362347  4024 net.cpp:226] fc3 needs backward computation.
I0914 20:03:01.362360  4024 net.cpp:226] relu2 needs backward computation.
I0914 20:03:01.362370  4024 net.cpp:226] fc2 needs backward computation.
I0914 20:03:01.362381  4024 net.cpp:226] relu1 needs backward computation.
I0914 20:03:01.362392  4024 net.cpp:226] fc1 needs backward computation.
I0914 20:03:01.362403  4024 net.cpp:228] label_validation_database_1_split does not need backward computation.
I0914 20:03:01.362416  4024 net.cpp:228] validation_database does not need backward computation.
I0914 20:03:01.362426  4024 net.cpp:270] This network produces output accuracy
I0914 20:03:01.362438  4024 net.cpp:270] This network produces output loss
I0914 20:03:01.362460  4024 net.cpp:283] Network initialization done.
I0914 20:03:01.362552  4024 solver.cpp:60] Solver scaffolding done.
I0914 20:03:01.362591  4024 caffe.cpp:251] Starting Optimization
I0914 20:03:01.362601  4024 solver.cpp:279] Solving fc2Net
I0914 20:03:01.362612  4024 solver.cpp:280] Learning Rate Policy: step
I0914 20:03:01.367985  4024 solver.cpp:337] Iteration 0, Testing net (#0)
I0914 20:03:01.368085  4024 net.cpp:693] Ignoring source layer train_database
I0914 20:03:04.568979  4024 solver.cpp:404]     Test net output #0: accuracy = 0.07575
I0914 20:03:04.569093  4024 solver.cpp:404]     Test net output #1: loss = 2.20947 (* 1 = 2.20947 loss)
I0914 20:03:04.610549  4024 solver.cpp:228] Iteration 0, loss = 2.31814
I0914 20:03:04.610666  4024 solver.cpp:244]     Train net output #0: loss = 2.31814 (* 1 = 2.31814 loss)
*** Aborted at 1473872584 (unix time) try "date -d @1473872584" if you are using GNU date ***
PC: @     0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
*** SIGFPE (@0x7f6870b62c52) received by PID 4024 (TID 0x7f6871004a40) from PID 1890987090; stack trace: ***
    @     0x7f686f6bbcb0 (unknown)
    @     0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
    @     0x7f6870b62e44 caffe::SGDSolver<>::ApplyUpdate()
    @     0x7f6870b8e2fc caffe::Solver<>::Step()
    @     0x7f6870b8eb09 caffe::Solver<>::Solve()
    @           0x40821d train()
    @           0x40589c main
    @     0x7f686f6a6f45 (unknown)
    @           0x40610b (unknown)
    @                0x0 (unknown)
Floating point exception (core dumped)
Done!

Answer:

Look at your error message: you got SIGFPE signal. This indicates you got an arithmetic error. Furthermore, the function that causes this error is the function that evaluates the learning rate.

It appears as if you did not configure the learning rate policy correctly in your 'solver.prototxt'

Question:

I am trying to compile Caffe from the official GitHub sources + using a couple of layer cpp files added by a user. When compiling I get the following error:

f@f-VirtualBox:~/caffe/mts4/caffe-master$ sudo make all
CXX/LD -o .build_release/tools/caffe.bin
.build_release/lib/libcaffe.so: undefined reference to `PyString_FromString'
.build_release/lib/libcaffe.so: undefined reference to `PyErr_Print'
.build_release/lib/libcaffe.so: undefined reference to `PyObject_CallObject'
.build_release/lib/libcaffe.so: undefined reference to `PyInt_FromLong'
.build_release/lib/libcaffe.so: undefined reference to `PyList_SetItem'
.build_release/lib/libcaffe.so: undefined reference to `PyCallable_Check'
.build_release/lib/libcaffe.so: undefined reference to `PyImport_Import'
.build_release/lib/libcaffe.so: undefined reference to `Py_Initialize'
.build_release/lib/libcaffe.so: undefined reference to `PyFloat_AsDouble'
.build_release/lib/libcaffe.so: undefined reference to `PyTuple_SetItem'
.build_release/lib/libcaffe.so: undefined reference to `PyObject_GetAttrString'
.build_release/lib/libcaffe.so: undefined reference to `PyList_New'
.build_release/lib/libcaffe.so: undefined reference to `PyTuple_New'
.build_release/lib/libcaffe.so: undefined reference to `PyErr_Occurred'
collect2: error: ld returned 1 exit status
Makefile:560: recipe for target '.build_release/tools/caffe.bin' failed
make: *** [.build_release/tools/caffe.bin] Error 1
f@f-VirtualBox:~/caffe/mts4/caffe-master$ 

Answer:

The compilation error means that at least some cpp files added or altered by the user use Python. To remediate the issue, you should uncomment WITH_PYTHON_LAYER := 1 in the Makefile.config before compiling.