## Hot questions for Using Neural networks in lmdb

Question:

I am trying to build a deep learning model for Saliency analysis using caffe (I am using the python wrapper). But I am unable to understand how to generate the lmdb data structure for this purpose. I have gone through the Imagenet and mnist examples and I understand that I should generate labels in the format

my_test_dir/picture-foo.jpg 0

But in my case, I will be labeling each pixel with 0 or 1 indicating whether that pixel is salient or not. That won't be a single label for an image.

How to generate lmdb files for a per pixel based labeling ?

Answer:

You can approach this problem in two ways:

**1.** Using HDF5 data layer instead of LMDB. HDF5 is more flexible and can support labels the size of the image. You can see this answer for an example of constructing and using HDF5 input data layer.

**2.** You can have two LMDB input layers: one for the image and one for the label. Note that when you build the LMDB you must **not** use the `'shuffle'`

option in order to have the images and their labels in sync.

**Update:** I recently gave a more detailed answer here.

Question:

I have two LMDB files, with the first one my network trains fine while with the other one doesn't really work (loss starts and stays at 0). So I figured that maybe there's something wrong with the second LMDB. I tried writing some python code (mostly taken from here) to fetch the data from my LMDBs and inspect it but so far no luck with any of the 2 databases. The LMDBs contain images as data and bounding box information as labels.

Doing this:

for key, value in lmdb_cursor: datum.ParseFromString(value) label = datum.label data = caffe.io.datum_to_array(datum)

on either one of the LMDBs gives me a key which is correctly the name of the image, but that `datum.ParseFromString`

function is not able to retrieve anything from `value`

. `label`

is always 0, while data is an empty ndarray. Nonetheless, the data is there, value is a binary string of around 140 KB which correctly accounts for the size of the image plus the bounding box information I guess.

I tried browsing several answers and discussions dealing with reading data from LMDBs in python, but I couldn't find any clue on how to read structured information such as bounding box labels. My guess is that the parsing function expects a digit label and interprets the first bytes as such, with the remaining data being then lost due to the binary string not making any sense anymore?

I know for a fact that at least the first LMDB is correct since my network performs correctly in both training and testing using it.

Any inputs will be greatly appreciated!

Answer:

The basic element stored in your LMDB is not `Datum`

, but rather `AnnotatedDatum`

. Threfore, you need to approach it with a little care:

datum.ParseFromString(value.datum) value.annotation_group # should store the annotations

Question:

I am relatively new to using caffe and am trying to create minimal working examples that I can (later) tweak. I had no difficulty using caffe's examples with MNIST data. I downloaded image-net data (ILSVRC12) and used caffe's tool to convert it to an lmdb database using:

$CAFFE_ROOT/build/install/bin/convert_imageset -shuffle -encoded=true top_level_data_dir/ fileNames.txt lmdb_name

To create an lmdb containing encoded (jpeg) image data. The reason for this is that encoded, the lmdb is about 64GB versus unencoded being about 240GB.

My .prototxt file that describes the net is minimal (a pair of inner product layers, mostly borrowed from the MNIST example--not going for accuracy here, I just want something to work).

name: "example" layer { name: "imagenet" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "train-lmdb" batch_size: 100 backend: LMDB } } layer { name: "imagenet" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "test-lmdb" batch_size: 100 backend: LMDB } } layer { name: "ip1" type: "InnerProduct" bottom: "data" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 1000 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 1000 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }

When train-lmdb is unencoded, this .prototxt file works fine (accuracy is abysmal, but caffe does not crash). However, if train-lmdb is encoded then I get the following error:

data_transformer.cpp:239] Check failed: channels == img_channels (3 vs. 1)

** Question:** Is there some "flag" I must set in the .prototxt file that indicates that the train-lmdb is encoded images? (The same flag would likely have to be given to for the testing data layer, test-lmdb.)

*A little research:*

Poking around with google I found a resolved issue which seemed promising. However, setting the `'force_encoded_color'`

to true did not resolve my problem.

I also found this answer very helpful with creating the lmdb (specifically, with directions for enabling the encoding), however, no mention was made of what should be done so that caffe is aware that the images are encoded.

Answer:

The error message you got:

data_transformer.cpp:239] Check failed: channels == img_channels (3 vs. 1)

means caffe data transformer is expecting input with 3 `channels`

(i.e., color image), but is getting an image with only 1 `img_channels`

(i.e., gray scale image).

looking ar `caffe.proto`

it would seems like you should set the parameter at the `transformation_param`

:

layer { name: "imagenet" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 force_color: true ## try this } data_param { source: "train-lmdb" batch_size: 100 backend: LMDB force_encoded_color: true ## cannot hurt... } }

Question:

I look at the python example for Lenet and see that the number of iterations needed to run over the entire MNIST test dataset is hard-coded. However, can this value be not hard-coded at all? How to get the number of samples of the dataset pointed by a network in python?

Answer:

You can use the `lmdb`

library to access the lmdb directly

import lmdb db = lmdb.open('/path/to/lmdb_folder') //Needs lmdb - method num_examples = int( db.stat()['entries'] )

Should do the trick for you.

Question:

I have been trying to run the squeeznet model for quite some time now,after resolving multiple errors,i am stuck with this one-

when i run the command

./build/tools/caffe train -solve SqueezeNet/SqueezeNet_v1.0/solver.prototxt

i get

I0723 16:26:58.532799 11108 layer_factory.hpp:77] Creating layer data F0723 16:26:58.629655 11108 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory *** Check failure stack trace: *** @ 0x7fb24de835cd google::LogMessage::Fail() @ 0x7fb24de85433 google::LogMessage::SendToLog() @ 0x7fb24de8315b google::LogMessage::Flush() @ 0x7fb24de85e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fb24e23efd8 caffe::db::LMDB::Open() @ 0x7fb24e2b541f caffe::DataLayer<>::DataLayer() @ 0x7fb24e2b55b2 caffe::Creator_DataLayer<>() @ 0x7fb24e290a59 caffe::Net<>::Init() @ 0x7fb24e29343e caffe::Net<>::Net() @ 0x7fb24e22a315 caffe::Solver<>::InitTrainNet() @ 0x7fb24e22b6f5 caffe::Solver<>::Init() @ 0x7fb24e22ba0f caffe::Solver<>::Solver() @ 0x7fb24e21c851 caffe::Creator_SGDSolver<>() @ 0x40a958 train() @ 0x4072f8 main @ 0x7fb24c50c830 __libc_start_main @ 0x407bc9 _start @ (nil) (unknown) Aborted (core dumped)

Any suggestions?

Answer:

It seems like caffe cannot find the LMDB database storing your training/validation data.

Make sure that the LMDB pointed by the path in `source: ...`

parameter in your `"Data"`

layer exists and that you have read permissions for this dataset.

Question:

I am trying to use the LMDB file that I created to define the data layer in caffe net and I get below error

TypeError: 'LMDB' has type (type 'str'), but expected one of: (type 'int', type 'long')

I checked for labels in the text file that I passed to script that generates lmdb file (`caffe/build/tools/convert_imageset`

).
Am I missing something here?

Edit -1: Here is my data layer definition:

n.data,n.labels = L.Data(batch_size = batch_size, source=lmdb_src, backend = "LMDB", transform_param = dict(mean_file = mean_file), ntop=2)

Answer:

You are trying to set

backend: "LMDB"

in your net definition, instead of

backend: LMDB

Note that `LMDB`

is *not* passed as string, but rather as an enumerated integer.

What you should do is set

backend = caffe.Data.LMDB

Use the enum value set by caffe protobuff definition.

Question:

I am trying to learn Caffe by training the AlexNet on black and white images with Circles (Label: "1") and Rectangles (Label: "0"). I'm using 1800 training images (900 Circles and 900 Rectangles). For example:

My train_val.prototxt looks like this:

name: "AlexNet" layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } data_param { source: "newlmdb" batch_size: 100 backend: LMDB } } layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TEST } data_param { source: "newvallmdb" batch_size: 50 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc8" type: "InnerProduct" bottom: "fc7" top: "fc8" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc8" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc8" bottom: "label" top: "loss" }

My solver.prototxt looks like this:

net: "train_val.prototxt" test_iter: 200 test_interval: 200 base_lr: 0.01 lr_policy: "step" gamma: 0.1 stepsize: 50 display: 20 max_iter: 500 momentum: 0.9 weight_decay: 0.0005 snapshot: 100 snapshot_prefix: "training" solver_mode: GPU

While trainig I get this output:

I1018 10:13:04.936286 7404 solver.cpp:330] Iteration 0, Testing net (#0) I1018 10:13:06.262091 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:07.556700 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:11.440527 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:12.267205 7404 solver.cpp:397] Test net output #0: accuracy = 0.94 I1018 10:13:12.267205 7404 solver.cpp:397] Test net output #1: loss = 0.104804 (* 1 = 0.104804 loss) I1018 10:13:12.594758 7404 solver.cpp:218] Iteration 0 (-9.63533e-42 iter/s, 7.69215s/20 iters), loss = 0.873365 I1018 10:13:12.594758 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:12.594758 7404 sgd_solver.cpp:105] Iteration 0, lr = 0.01 I1018 10:13:15.807883 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:17.305263 7404 solver.cpp:218] Iteration 20 (4.25024 iter/s, 4.70562s/20 iters), loss = 0.873365 I1018 10:13:17.305263 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:17.305263 7404 sgd_solver.cpp:105] Iteration 20, lr = 0.01 I1018 10:13:20.019263 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:21.984572 7404 solver.cpp:218] Iteration 40 (4.26967 iter/s, 4.6842s/20 iters), loss = 0.873365 I1018 10:13:21.984572 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:21.984572 7404 sgd_solver.cpp:105] Iteration 40, lr = 0.01 I1018 10:13:24.246239 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:26.695078 7404 solver.cpp:218] Iteration 60 (4.25863 iter/s, 4.69634s/20 iters), loss = 0.873365 I1018 10:13:26.695078 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:26.695078 7404 sgd_solver.cpp:105] Iteration 60, lr = 0.001 I1018 10:13:28.426422 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:31.421181 7404 solver.cpp:218] Iteration 80 (4.22339 iter/s, 4.73554s/20 iters), loss = 0.873365 I1018 10:13:31.421181 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:31.421181 7404 sgd_solver.cpp:105] Iteration 80, lr = 0.001 I1018 10:13:32.731387 7748 data_layer.cpp:73] Restarting data prefetching from start. [I 10:13:32.934 NotebookApp] Saving file at /Untitled2.ipynb I1018 10:13:35.788537 7404 solver.cpp:447] Snapshotting to binary proto file training_iter_100.caffemodel I1018 10:13:37.317111 7404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file training_iter_100.solverstate I1018 10:13:38.081399 7404 solver.cpp:218] Iteration 100 (3.00631 iter/s, 6.65267s/20 iters), loss = 0 I1018 10:13:38.081399 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:13:38.081399 7404 sgd_solver.cpp:105] Iteration 100, lr = 0.0001 I1018 10:13:38.908077 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:42.791904 7404 solver.cpp:218] Iteration 120 (4.23481 iter/s, 4.72276s/20 iters), loss = 0 I1018 10:13:42.807502 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:13:42.807502 7404 sgd_solver.cpp:105] Iteration 120, lr = 0.0001 I1018 10:13:43.088260 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:47.393225 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:47.549202 7404 solver.cpp:218] Iteration 140 (4.21716 iter/s, 4.74253s/20 iters), loss = 0 I1018 10:13:47.549202 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:13:47.549202 7404 sgd_solver.cpp:105] Iteration 140, lr = 0.0001 I1018 10:13:51.635800 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:52.290904 7404 solver.cpp:218] Iteration 160 (4.21268 iter/s, 4.74757s/20 iters), loss = 0 I1018 10:13:52.290904 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:13:52.290904 7404 sgd_solver.cpp:105] Iteration 160, lr = 1e-05 I1018 10:13:56.003156 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:13:57.048202 7404 solver.cpp:218] Iteration 180 (4.20926 iter/s, 4.75142s/20 iters), loss = 0.873365 I1018 10:13:57.048202 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:13:57.048202 7404 sgd_solver.cpp:105] Iteration 180, lr = 1e-05 I1018 10:14:00.214535 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:01.431155 7404 solver.cpp:447] Snapshotting to binary proto file training_iter_200.caffemodel I1018 10:14:03.053316 7404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file training_iter_200.solverstate I1018 10:14:03.552443 7404 solver.cpp:330] Iteration 200, Testing net (#0) I1018 10:14:04.082764 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:05.439764 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:10.727385 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:10.789775 7404 blocking_queue.cpp:49] Waiting for data I1018 10:14:10.961350 7404 solver.cpp:397] Test net output #0: accuracy = 0.94 I1018 10:14:10.961350 7404 solver.cpp:397] Test net output #1: loss = 0.104804 (* 1 = 0.104804 loss) I1018 10:14:11.179718 7404 solver.cpp:218] Iteration 200 (1.41459 iter/s, 14.1384s/20 iters), loss = 0.873365 I1018 10:14:11.179718 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:11.179718 7404 sgd_solver.cpp:105] Iteration 200, lr = 1e-06 I1018 10:14:13.846925 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:15.952615 7404 solver.cpp:218] Iteration 220 (4.19673 iter/s, 4.76562s/20 iters), loss = 0.873365 I1018 10:14:15.952615 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:15.952615 7404 sgd_solver.cpp:105] Iteration 220, lr = 1e-06 I1018 10:14:18.198683 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:20.709913 7404 solver.cpp:218] Iteration 240 (4.19817 iter/s, 4.76398s/20 iters), loss = 0.873365 I1018 10:14:20.709913 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:20.709913 7404 sgd_solver.cpp:105] Iteration 240, lr = 1e-06 I1018 10:14:22.441257 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:25.498407 7404 solver.cpp:218] Iteration 260 (4.18243 iter/s, 4.78191s/20 iters), loss = 0.873365 I1018 10:14:25.498407 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:25.498407 7404 sgd_solver.cpp:105] Iteration 260, lr = 1e-07 I1018 10:14:26.761821 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:30.271303 7404 solver.cpp:218] Iteration 280 (4.18629 iter/s, 4.7775s/20 iters), loss = 0 I1018 10:14:30.271303 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:14:30.271303 7404 sgd_solver.cpp:105] Iteration 280, lr = 1e-07 I1018 10:14:31.129176 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:34.701050 7404 solver.cpp:447] Snapshotting to binary proto file training_iter_300.caffemodel I1018 10:14:36.136039 7404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file training_iter_300.solverstate I1018 10:14:36.931521 7404 solver.cpp:218] Iteration 300 (3.00228 iter/s, 6.66161s/20 iters), loss = 0 I1018 10:14:36.931521 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:14:36.931521 7404 sgd_solver.cpp:105] Iteration 300, lr = 1e-08 I1018 10:14:37.337061 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:41.595233 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:41.688819 7404 solver.cpp:218] Iteration 320 (4.20513 iter/s, 4.7561s/20 iters), loss = 0 I1018 10:14:41.688819 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:14:41.688819 7404 sgd_solver.cpp:105] Iteration 320, lr = 1e-08 I1018 10:14:45.884600 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:46.461715 7404 solver.cpp:218] Iteration 340 (4.19496 iter/s, 4.76763s/20 iters), loss = 0 I1018 10:14:46.461715 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:14:46.461715 7404 sgd_solver.cpp:105] Iteration 340, lr = 1e-08 I1018 10:14:50.111598 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:51.234639 7404 solver.cpp:218] Iteration 360 (4.1858 iter/s, 4.77806s/20 iters), loss = 0.873365 I1018 10:14:51.234639 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:51.234639 7404 sgd_solver.cpp:105] Iteration 360, lr = 1e-09 I1018 10:14:54.478982 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:14:56.007566 7404 solver.cpp:218] Iteration 380 (4.19437 iter/s, 4.76829s/20 iters), loss = 0.873365 I1018 10:14:56.007566 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:14:56.007566 7404 sgd_solver.cpp:105] Iteration 380, lr = 1e-09 I1018 10:14:58.705986 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:00.421743 7404 solver.cpp:447] Snapshotting to binary proto file training_iter_400.caffemodel I1018 10:15:01.903534 7404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file training_iter_400.solverstate I1018 10:15:02.371469 7404 solver.cpp:330] Iteration 400, Testing net (#0) I1018 10:15:03.478912 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:04.820323 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:06.146136 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:07.471949 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:08.813360 7792 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:09.796021 7404 solver.cpp:397] Test net output #0: accuracy = 0.95 I1018 10:15:09.796021 7404 solver.cpp:397] Test net output #1: loss = 0.0873365 (* 1 = 0.0873365 loss) I1018 10:15:10.014390 7404 solver.cpp:218] Iteration 400 (1.4278 iter/s, 14.0076s/20 iters), loss = 0.873365 I1018 10:15:10.014390 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:15:10.014390 7404 sgd_solver.cpp:105] Iteration 400, lr = 1e-10 I1018 10:15:12.291669 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:14.787317 7404 solver.cpp:218] Iteration 420 (4.18883 iter/s, 4.7746s/20 iters), loss = 0.873365 I1018 10:15:14.787317 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:15:14.787317 7404 sgd_solver.cpp:105] Iteration 420, lr = 1e-10 I1018 10:15:16.582064 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:19.545646 7404 solver.cpp:218] Iteration 440 (4.20273 iter/s, 4.75881s/20 iters), loss = 0.873365 I1018 10:15:19.545646 7404 solver.cpp:237] Train net output #0: loss = 0.873365 (* 1 = 0.873365 loss) I1018 10:15:19.545646 7404 sgd_solver.cpp:105] Iteration 440, lr = 1e-10 I1018 10:15:20.824666 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:24.334172 7404 solver.cpp:218] Iteration 460 (4.18022 iter/s, 4.78443s/20 iters), loss = 0 I1018 10:15:24.334172 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:15:24.334172 7404 sgd_solver.cpp:105] Iteration 460, lr = 1e-11 I1018 10:15:25.114061 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:29.107098 7404 solver.cpp:218] Iteration 480 (4.18678 iter/s, 4.77694s/20 iters), loss = 0 I1018 10:15:29.107098 7404 solver.cpp:237] Train net output #0: loss = 0 (* 1 = 0 loss) I1018 10:15:29.107098 7404 sgd_solver.cpp:105] Iteration 480, lr = 1e-11 I1018 10:15:29.497043 7748 data_layer.cpp:73] Restarting data prefetching from start. I1018 10:15:33.505677 7404 solver.cpp:447] Snapshotting to binary proto file training_iter_500.caffemodel I1018 10:15:35.112251 7404 sgd_solver.cpp:273] Snapshotting solver state to binary proto file training_iter_500.solverstate I1018 10:15:35.751760 7404 solver.cpp:310] Iteration 500, loss = 0 I1018 10:15:35.751760 7404 solver.cpp:315] Optimization Done.

As you can see the loss is either constant 0.873365 or 0 and I don't know why. When I use the following code for testing images I always get in return zero:

img = caffe.io.load_image('val/img911.png', color=False) grayimg = img[:,:,0] gi = np.reshape(grayimg, (260,260,1)) net = caffe.Net('deploy.prototxt', 'training_iter_500.caffemodel', caffe.TEST) transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) transformer.set_raw_scale('data', 255.0) net.blobs['data'].reshape(1,1,260,260) net.blobs['data'].data[...] = transformer.preprocess('data', gi) out = net.forward() print out['prob'].argmax()

To create the LMDB file I used this script:

import numpy as np import lmdb import caffe import cv2 N = 1800 X = np.zeros((N, 1, 260, 260), dtype=np.uint8) y = np.zeros(N, dtype=np.int64) map_size = X.nbytes * 10 file = open("train.txt", "r") files = file.readlines() print(len(files)) for i in range(0,len(files)): line = files[i] img_path = line.split()[0] label = line.split()[1] img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) X[i]=img env = lmdb.open('newlmdb', map_size=map_size) with env.begin(write=True) as txn: # txn is a Transaction object for i in range(N): datum = caffe.proto.caffe_pb2.Datum() datum.channels = X.shape[1] datum.height = X.shape[2] datum.width = X.shape[3] datum.data = X[i].tobytes() # or .tostring() if numpy < 1.9 datum.label = int(y[i]) y[i]=label

Is this a mistake in my code or did I choose the parameters for the network to bad?

##### EDIT

I edited my data layer to get zero-mean inputs:

layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 260 mean_file: "formen_mean.binaryproto" } data_param { source: "newlmdb" batch_size: 10 backend: LMDB } }

Increased the number of training images to 10000 and test images to 1000, shuffled my data and edited my solver.prototxt:

net: "train_val.prototxt" test_iter: 20 test_interval: 50 base_lr: 0.01 lr_policy: "step" gamma: 0.1 stepsize: 50 display: 20 max_iter: 1000 momentum: 0.9 weight_decay: 0.0005 snapshot: 200 debug_info: true snapshot_prefix: "training" solver_mode: GPU

At some point in the Debug info the following happened:

I1018 14:21:16.238169 5540 net.cpp:619] [Backward] Layer drop6, bottom blob fc6 diff: 2.64904e-05 I1018 14:21:16.238169 5540 net.cpp:619] [Backward] Layer relu6, bottom blob fc6 diff: 1.33896e-05 I1018 14:21:16.269316 5540 net.cpp:619] [Backward] Layer fc6, bottom blob pool2 diff: 8.48778e-06 I1018 14:21:16.269316 5540 net.cpp:630] [Backward] Layer fc6, param blob 0 diff: 0.000181272 I1018 14:21:16.269316 5540 net.cpp:630] [Backward] Layer fc6, param blob 1 diff: 0.000133896 I1018 14:21:16.269316 5540 net.cpp:619] [Backward] Layer pool2, bottom blob norm2 diff: 1.82455e-06 I1018 14:21:16.269316 5540 net.cpp:619] [Backward] Layer norm2, bottom blob conv2 diff: 1.82354e-06 I1018 14:21:16.269316 5540 net.cpp:619] [Backward] Layer relu2, bottom blob conv2 diff: 1.41858e-06 I1018 14:21:16.284889 5540 net.cpp:619] [Backward] Layer conv2, bottom blob pool1 diff: 1.989e-06 I1018 14:21:16.284889 5540 net.cpp:630] [Backward] Layer conv2, param blob 0 diff: 0.00600851 I1018 14:21:16.284889 5540 net.cpp:630] [Backward] Layer conv2, param blob 1 diff: 0.00107259 I1018 14:21:16.284889 5540 net.cpp:619] [Backward] Layer pool1, bottom blob norm1 diff: 4.57322e-07 I1018 14:21:16.284889 5540 net.cpp:619] [Backward] Layer norm1, bottom blob conv1 diff: 4.54691e-07 I1018 14:21:16.284889 5540 net.cpp:619] [Backward] Layer relu1, bottom blob conv1 diff: 2.18649e-07 I1018 14:21:16.284889 5540 net.cpp:630] [Backward] Layer conv1, param blob 0 diff: 0.0333731 I1018 14:21:16.284889 5540 net.cpp:630] [Backward] Layer conv1, param blob 1 diff: 0.000384605 E1018 14:21:16.331610 5540 net.cpp:719] [Backward] All net params (data, diff): L1 norm = (1.0116e+06, 55724.3); L2 norm = (80.218, 24.0218) I1018 14:21:16.331610 5540 solver.cpp:218] Iteration 0 (0 iter/s, 1.69776s/20 iters), loss = 8.73365 I1018 14:21:16.331610 5540 solver.cpp:237] Train net output #0: loss = 8.73365 (* 1 = 8.73365 loss) I1018 14:21:16.331610 5540 sgd_solver.cpp:105] Iteration 0, lr = 0.01 I1018 14:21:19.726611 5540 net.cpp:591] [Forward] Layer data, top blob data data: 44.8563 I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer data, top blob label data: 1 I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer conv1, top blob conv1 data: nan I1018 14:21:19.742184 5540 net.cpp:603] [Forward] Layer conv1, param blob 0 data: nan I1018 14:21:19.742184 5540 net.cpp:603] [Forward] Layer conv1, param blob 1 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer relu1, top blob conv1 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer norm1, top blob norm1 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer pool1, top blob pool1 data: inf I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer conv2, top blob conv2 data: nan I1018 14:21:19.742184 5540 net.cpp:603] [Forward] Layer conv2, param blob 0 data: nan I1018 14:21:19.742184 5540 net.cpp:603] [Forward] Layer conv2, param blob 1 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer relu2, top blob conv2 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer norm2, top blob norm2 data: nan I1018 14:21:19.742184 5540 net.cpp:591] [Forward] Layer pool2, top blob pool2 data: inf

So I reduced the base_lr to 0.0001. But at some later point the gradient drops to zero:

I1018 14:24:40.919765 5500 net.cpp:591] [Forward] Layer loss, top blob loss data: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer loss, bottom blob fc8 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer fc8, bottom blob fc7 diff: 0 I1018 14:24:40.919765 5500 net.cpp:630] [Backward] Layer fc8, param blob 0 diff: 0 I1018 14:24:40.919765 5500 net.cpp:630] [Backward] Layer fc8, param blob 1 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer drop7, bottom blob fc7 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer relu7, bottom blob fc7 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer fc7, bottom blob fc6 diff: 0 I1018 14:24:40.919765 5500 net.cpp:630] [Backward] Layer fc7, param blob 0 diff: 0 I1018 14:24:40.919765 5500 net.cpp:630] [Backward] Layer fc7, param blob 1 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer drop6, bottom blob fc6 diff: 0 I1018 14:24:40.919765 5500 net.cpp:619] [Backward] Layer relu6, bottom blob fc6 diff: 0 I1018 14:24:40.936337 5500 net.cpp:619] [Backward] Layer fc6, bottom blob pool2 diff: 0 I1018 14:24:40.936337 5500 net.cpp:630] [Backward] Layer fc6, param blob 0 diff: 0 I1018 14:24:40.936337 5500 net.cpp:630] [Backward] Layer fc6, param blob 1 diff: 0 I1018 14:24:40.936337 5500 net.cpp:619] [Backward] Layer pool2, bottom blob norm2 diff: 0 I1018 14:24:40.951910 5500 net.cpp:619] [Backward] Layer norm2, bottom blob conv2 diff: 0 I1018 14:24:40.967483 5500 net.cpp:619] [Backward] Layer relu2, bottom blob conv2 diff: 0 I1018 14:24:40.967483 5500 net.cpp:619] [Backward] Layer conv2, bottom blob pool1 diff: 0 I1018 14:24:40.967483 5500 net.cpp:630] [Backward] Layer conv2, param blob 0 diff: 0 I1018 14:24:40.967483 5500 net.cpp:630] [Backward] Layer conv2, param blob 1 diff: 0 I1018 14:24:40.967483 5500 net.cpp:619] [Backward] Layer pool1, bottom blob norm1 diff: 0 I1018 14:24:40.967483 5500 net.cpp:619] [Backward] Layer norm1, bottom blob conv1 diff: 0 I1018 14:24:40.967483 5500 net.cpp:619] [Backward] Layer relu1, bottom blob conv1 diff: 0

Answer:

I don't know why your net does not learn. But here are some points you might want to consider:

- Your test phase: test
`batch_size`

is 50 and`test_iter`

is 200 meaning you are validating on`50*200=10,000`

examples. Since you only have 1,800 examples total - what is the meaning of this large`test_iter`

value? Look at this thread for more information about this issue. - It seems like you are using the images "as is" meaning your input values' range is [0..255]. It is very common to subtract the mean from the net's inputs so that you have zero-mean inputs to the net.
- Consider looking at your training's debug info: does your gradient vanishes? do you have layers that are not "active" (e.g., a layer with all negative values with a
`"ReLU"`

on top is practically inactive). - Getting a constant loss value suggests that your layer predicts only one label regardless of the inputs, consider shuffling your dataset.

Question:

I want to create an lmdb dataset from images which part of them contain the feature I want caffe to learn, and part of them don't. My question is - in the text input file transferred to convert_imageset - how should I label those images that don't contain the feature? I know the format is

PATH_TO_IMAGE LABEL PATH_TO_IMAGE LABEL PATH_TO_IMAGE LABEL

But which label should I assign to images **without** the feature?
For example, img1.jpg contain the feature, img2.jpg and img3.jpg don't.
So should the text file look like -

img1.jpg 0 img2.jpg 1? img3.jpg 1?

Thanks!

Answer:

Got an answer from Caffe-users Google Group - yes, creating a dummy feature is the right way for this. So it is:

img1.jpg 0 img2.jpg 1 img3.jpg 1

Question:

I am a beginner in Caffe and Python. I installed Caffe and compiled it successfully in ubuntu 16.04. I created an environment in anaconda 2 and used Cmake for compiling. I ran this code and it printed caffe version.

$ python -c "import caffe;print caffe.__version__" 1.0.0-rc3

So I suppose that I have installed correctly. I wanted to have my first experience in caffe, so I followed the instructions in this link. But I am not really familiar with this. It is giving me this error:

~/deeplearning-cats-dogs-tutorial/code$ python create_lmdb.py Traceback (most recent call last): File "create_lmdb.py", line 21, in <module> import lmdb ImportError: No module named lmdb

I really appreciate if someone can guide me how to start running examples and models in caffe.

Answer:

It seems like you need to install LMDB python package: https://lmdb.readthedocs.io/en/release/

Question:

I'm using two lmdb inputs for identifying eyes, nosetip and mouth regions of a face. The data lmdb is of dimension `N`

x3x`H`

x`W`

while the label lmdb is of dimension `N`

x1x`H`

/4x`W`

/4. The label image is created by masking regions using numbers 1-4 on an opencv Mat that was initialized to be all 0s (so in total there are 5 labels with 0 being the background label). I scaled down the label image to be 1/4 in width and height of the corresponding image because I have 2 pooling layers in my net. This downscaling ensures the label image dimension will match the output of the last convolution layer.

My train_val.prototxt:

name: "facial_keypoints" layer { name: "images" type: "Data" top: "images" include { phase: TRAIN } transform_param { mean_file: "../mean.binaryproto" } data_param { source: "../train_lmdb" batch_size: 100 backend: LMDB } } layer { name: "labels" type: "Data" top: "labels" include { phase: TRAIN } data_param { source: "../train_label_lmdb" batch_size: 100 backend: LMDB } } layer { name: "images" type: "Data" top: "images" include { phase: TEST } transform_param { mean_file: "../mean.binaryproto" } data_param { source: "../test_lmdb" batch_size: 100 backend: LMDB } } layer { name: "labels" type: "Data" top: "labels" include { phase: TEST } data_param { source: "../test_label_lmdb" batch_size: 100 backend: LMDB } } layer { name: "conv1" type: "Convolution" bottom: "images" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "pool1" top: "pool1" } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: AVE kernel_size: 3 stride: 2 } } layer { name: "conv_last" type: "Convolution" bottom: "pool2" top: "conv_last" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 5 pad: 2 kernel_size: 5 stride: 1 weight_filler { #type: "xavier" type: "gaussian" std: 0.01 } bias_filler { type: "constant" } } } layer { name: "relu2" type: "ReLU" bottom: "conv_last" top: "conv_last" } layer { name: "accuracy" type: "Accuracy" bottom: "conv_last" bottom: "labels" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "conv_last" bottom: "labels" top: "loss" }

In the last convolution layer, I set the output size to be 5 because I have 5 label classes. The training is converged with final loss at about 0.3 and accuracy 0.9 (although some sources suggest this accuracy is not correctly measured for multi-lablels). When using the trained model, the output layer correctly produces a blob of dimension 1x5x`H`

/4x`W`

/4 which I managed to visualze as 5 separate single channel images. However, while the first image correctly hightlighted the background pixels, the remaining 4 images looks almost the same with all 4 regions highlighted.

visualization of 5 output channels(intensity increases from blue to red):

original image(the concentric circles labels highest intensity from each channel. some are bigger just to distinguish from others. as you can see other than the background, the rest channels have highest activations almost on the same mouth region which should not be the case. )

Could someone help me spot the mistake I made?

Thanks.

Answer:

It seems like you are facing *class imbalance*: most of your labeled pixels are labeled 0 (Background), hence, during training the net learns to predict background almost regardless of what it "sees". Since predicting background is correct most of the time, the training loss decreases and the accuracy increases up to a certain point.
However, when you actually try to visualize the output prediction it is mostly background with little information regarding the other scarce labels.

One way of tackling class imbalance, in caffe, is to use "InfogainLoss" layer with weights tuned to counter-effect the imbalance of the labels.