Hot questions for Using Neural networks in deeplearning4j


I am doing FCN32 semantic segmentation on my data. I ran the algorithm to fine-tune for my data (grayscale images with only one channel), till 80,000 iterations; however, the loss and accuracy are fluctuating and the output image completely black. Even, the loss is so high after 80,000 iterations. I thought the classifier cannot do training well on my data. So, I am going to train from scratch. On the other hand, my data has imbalanced class members. The background pixels are more than the other four classes. Some researchers are suggesting using weighted loss. Does anyone have any idea? Am I doing the right way? How can I add this weighted loss to train_val.prototxt?

I will be thankful if you know any resources/examples related to training with weighted loss, please share with me here.

Thanks again


You can tackle class imbalance using "InfogainLoss". This loss can be viewed as an extension to "SoftmaxWithLoss" that enables you to "pay" different loss value per label. If you want to use "InfogainLoss" for pixel-wise predictions, you might need to use BVLC/caffe PR#3855.


I'm new to neural networks and NLP. I've found this library: DeepLearning4J. I'm trying to get it to work but whenever I execute this instruction:

Collection<String> similar = vec.wordsNearest("word_to_search", 10);

If the word I'm searching is mapped into the network I get the following exception:

java.lang.IllegalArgumentException: XERBLA: Error on argument 6 (LDA) in SGEMV
at org.jblas.NativeBlas.sgemv(Native Method)
at org.nd4j.linalg.jblas.blas.JblasLevel2.sgemv(
at org.nd4j.linalg.api.blas.impl.BaseLevel2.gemv(
at org.nd4j.linalg.api.ndarray.BaseNDArray.mmuli(
at org.nd4j.linalg.api.ndarray.BaseNDArray.mmul(
at org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl.wordsNearest(
at org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl.wordsNearest(
at word2vec.Word2VecTest.main(
Exception in thread "main" java.lang.NoSuchMethodError: org.nd4j.linalg.api.ndarray.INDArray.mean(I)Lorg/nd4j/linalg/api/ndarray/INDArray;
at org.deeplearning4j.models.embeddings.wordvectors.WordVectorsImpl.wordsNearest(
at word2vec.Word2VecTest.main(

I know that the NoSuchMethodError may be due to libraries different versions. In this specific case, this is probably caused by nd4j. I've checked the versions lots of time and this is what I'm importing at the moment:

  • akka-actor_2.11-2.4-M3.jar
  • akka-cluster_2.11-2.4-M3.jar
  • akka-remote_2.11-2.4-M3.jar
  • akka-slf4j_2.11-2.4-M3.jar
  • byte-buddy-0.6.15.jar
  • config-1.3.0.jar
  • deeplearning4j-core-
  • deeplearning4j-nlp-
  • deeplearning4j-scaleout-akka-
  • deeplearning4j-ui-
  • javassist-3.12.1.GA.jar
  • jblas-1.2.4.jar
  • jcublas-6.5.jar
  • lucene-analyzers-common-4.10.3.jar
  • lucene-core-4.10.3.jar
  • nd4j-api-0.4-rc3.4.jar
  • nd4j-bytebuddy-0.4-rc3.4.jar
  • nd4j-jblas-0.4-rc3.4.jar
  • nd4j-jcublas-common-0.4-rc3.4.jar
  • netty-3.10.4.Final.jar
  • protobuf-java-2.6.1.jar
  • reflections-0.9.10.jar
  • scala-library-2.12.0-M2.jar
  • selenium-server-standalone-2.47.1.jar

Can someone explain to me the problem?


The error is telling you that DeepLearning4J tried to call the method INDArray INDArray.mean(int value) but this method was not found.

Looking at nd4j 0.4-rc3.4 source code, you can see that the mean method actually takes a vararg int... as input. Since this is not int, the error is thrown.

This change was made by this commit when nd4j bumped version from to 0.4-rc0.

As a result, you need to downgrade nd4j to version With this downgrade, you will not have any more incompatibility since this is the actual version that DeepLearning4J is depending on. You can see that in the Maven dependencies of deeplearning4j-core-


mean , variance = tf.nn.moments(X_train, axes = 1, keep_dims = True)

I am trying to get the mean and variance using tf.nn.moments() as shown above. However, I am encountering the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-43-fc383f99b15b> in <module>()
     33 Y_train = Y_train.reshape(1,355)
     34 X_mean = tf.reduce_mean(X_train, axis = 1, keepdims = True)
---> 35 mean , variance = tf.nn.moments(X_train, axes = 1, keep_dims = True)
     36 X_train = tf.divide(tf.subtract(X_train,mean),tf.sqrt(variance))
     37 #Y_train = Y_train/(Y_train.max(axis = 1, keepdims = True))

/Users/abhinandanchiney/anaconda2/lib/python2.7/site-      packages/tensorflow/python/ops/nn_impl.pyc in moments(x, axes, shift, name, keep_dims)
    664     # sufficient statistics. As a workaround we simply perform the operations
    665     # on 32-bit floats before converting the mean and variance back to fp16
--> 666     y = math_ops.cast(x, dtypes.float32) if x.dtype == dtypes.float16 else x
    667     # Compute true mean while keeping the dims for proper broadcasting.
    668     mean = math_ops.reduce_mean(y, axes, keepdims=True, name="mean")

 TypeError: data type not understood

Kindly help where I am going wrong.


tf.nn.moments is expecting a tensor, not a numpy array:


  • x: A Tensor.

Try this:

x = tf.convert_to_tensor(X_train)
mean , variance = tf.nn.moments(x, axes = 1, keep_dims = True)


I'm playing a bit with DeepLearning4J and I wonder how I can make a classifier return a score instead of a label. Suppose I use the code from the linear classifier tutorial, I'd like to make the ANN return the probabilities for a given training example to be labeled 0 or 1. The current configuration looks as follows:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
        .layer(0, new DenseLayer.Builder()
        .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)


Use model.output .

You'll get back out an ndarray (

It uses softmax on the output which means you get back a batch size x number of labels output.


While debugging regression sample of deeplearning4j I've noticed that it doesn't have normalization of data inputs and outputs. So first of all question, why it doesn't have normalization? And second question, is there somewhere in network architecture normalization mechanism?

As prof of non-normalized input is the following screenshot which was taken right before execution of line

return new ListDataSetIterator(listDs,batchSize);


We actually do normalization. We just don't do it for you automatically. It's right in our examples:

All of our image classifiation examples do this. It's also documented on our website: We even have videos of this.

Edit: You can also normalize the labels if you want using the same DataNormalization api calling fitLabels(true) before you put data in to the neural network.

If you don't mind could you give me feedback as to how you couldn't find this so we can improve the website? I'm not sure what was missing here.


I need to make changes to an existing deeplearning4j (DL4J) model that has already been trained. The network consists of an input layer, one Graves LSTM and one RNN Output Layer.

My question is: Is it possible to add one or more untrained neurons to the LSTM layer without having to rebuild the model from a new config (which would I assume require retraining it)? I'd like to do things like, add one or more neurons to an existing layer, or add an entire layer (untrained) to a trained model.

Are these possible? I couldn't find any references to this, but I've seen folks doing it in other languages/frameworks so I wonder if I can also do it in DL4J.

BTW I'm aware this is an unusual thing to be doing. Please ignore the fact it will mess up the trained network, I just need to know if I can do it and how to go about it. :)

Any pointers will help!




You would use the transfer learning api to do that. See our examples here.

Docs below:

DL4J’s transfer learning API

The DL4J transfer learning API enables users to:

  • Modify the architecture of an existing model
  • Fine tune learning configurations of an existing model.
  • Hold parameters of a specified layer constant during training, also referred to as “frozen"

Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.

The transfer learning helper

The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.

  • Featurize an input dataset to save for future use
  • Fit the model with frozen layers with a featurized dataset
  • Output from the model with frozen layers given a featurized input.

When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.

Show me the code

This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from

I. Importing VGG16

TrainedModelHelper modelImportHelper = new TrainedModelHelper(TrainedModels.VGG16); ComputationGraph org.deeplearning4j.transferlearning.vgg16 = modelImportHelper.loadModel();

II. Set up a fine-tune configuration

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder() .learningRate(5e-5) .updater(Updater.NESTEROVS) .seed(seed) .build();

III. Build new models based on VGG16 A.Modifying only the last layer, keeping other frozen

The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(org.deeplearning4j.transferlearning.vgg16) .fineTuneConfiguration(fineTuneConf) .setFeatureExtractor(“fc2”) .removeVertexKeepConnections("predictions") .addLayer(“predictions”, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .nIn(4096).nOut(numClasses) .weightInit(WeightInit.Xavier) .activation(Activation.SOFTMAX).build(), ”fc2") .build(); After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.

B. Attach new layers to the bottleneck (block5_pool)

Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.

ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(org.deeplearning4j.transferlearning.vgg16) .fineTuneConfiguration(fineTuneConf) .setFeatureExtractor("block5_pool") .nOutReplace("fc2",1024, WeightInit.XAVIER) .removeVertexAndConnections("predictions") .addLayer(“fc3",new DenseLayer.Builder() .activation(Activation.RELU) .nIn(1024).nOut(256).build(),"fc2") .addLayer(“newpredictions”,new OutputLayer .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .activation(Activation.SOFTMAX) .nIn(256).nOut(numClasses).build(),”fc3") .setOutputs("newpredictions") .build();

C. Fine tune layers from a previously saved model

Say we have saved off our model from (B) and now want to allow “block_5” layers to train.

ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer) .fineTuneConfiguration(fineTuneConf) .setFeatureExtractor(“block4_pool”) .build();

IV. Saving “featurized” datasets and training with them.

We use the transfer learning helper API. Note this freezes the layers of the model passed in.

Here is how you obtain the featured version of the dataset at the specified layer “fc2”.

TransferLearningHelper transferLearningHelper = new TransferLearningHelper(org.deeplearning4j.transferlearning.vgg16, “fc2”); while(trainIter.hasNext()) { DataSet currentFeaturized = transferLearningHelper.featurize(; saveToDisk(currentFeaturized,trainDataSaved,true); trainDataSaved++; } Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III. TransferLearningHelper transferLearningHelper = new TransferLearningHelper(vgg16Transfer); while (trainIter.hasNext()) { transferLearningHelper.fitFeaturized(; }

Of note:
  • The TransferLearning builder returns a new instance of a dl4j model.

Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.

  • The trained model helper imports models from Keras without enforcing a training configuration.

Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.

  • Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into.

When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.

  • Frozen layer configurations are not saved when writing the model to disk.

In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.

- On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
- In place: With the transfer learning helper API which will apply the frozen layers to the given model.
  • FineTune configurations will selectively update learning parameters.

For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.


I need a very basic classification or similar example for deeplearning4j framework.

I have the classic training set in form of pairs of already normalized double arrays [0.01, 0.45, 0.0, ....] -> [0.0, 0.1, 0.0, 0.0, ...] and need to:

  1. Build and train a simple feedforward neural network with N hidden layers
  2. Feed a set of uncategorized double arrays to trained network and get a set of output vectors

Could somebody please share a basic and short example that does this?

UPD: Something like this but for deeplearning4j would really help.


Take a look at this example which shows how to train a nn on an XOR relationship. First you have to convert your doubles into ndarrays. To do this use the Nd4j.create(...) method. Then you need to set up a dataset like here.


I'm trying to train a model using deep learning in java, when I start training the train data it gives an error

Invalid classification data: expect label value (at label index column = 0) to be in range 0 to 1 inclusive (0 to numClasses-1, with numClasses=2); got label value of 2

I didn't understand the error since I am a beginner in deep learning 4j. I am using a data set which views relationship between two people (if there is a relationship between two people then the class label is going to be 1 otherwise 0).

The Java code

public class SNA {
private static Logger log = LoggerFactory.getLogger(SNA.class);

public static void main(String[] args) throws Exception {
    int seed = 123;
    double learningRate = 0.01;
    int batchSize = 50;
    int nEpochs = 30;
    int numInputs = 2;
    int numOutputs = 2;
    int numHiddenNodes = 20;

    //load the training data
    RecordReader rr = new CSVRecordReader(0,",");
    rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\train\\slashdotTrain.csv")));
    DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,0, 2);

    // load test data
    RecordReader rrTest = new CSVRecordReader();
    rr.initialize(new FileSplit(new File("C:\\Users\\GTS\\Desktop\\SNA project\\experiments\\First experiment\\test\\slashdotTest.csv")));
    DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2);"**** Building Model ****");
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .layer(0, new DenseLayer.Builder()
            .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)

    MultiLayerNetwork model = new MultiLayerNetwork(conf);

    // Listener to show how the network is training in the log
    model.setListeners(new ScoreIterationListener(10));" **** Train Model **** ");
    for (int i = 0; i < nEpochs; i++) {;

    System.out.println("**** Evaluate Model ****");
    Evaluation evaluation = new Evaluation(numOutputs);
    while (testIter.hasNext()) {
        DataSet t =;
        INDArray feature = t.getFeatureMatrix();
        INDArray labels = t.getLabels();
        INDArray predicted = model.output(feature, false);
        evaluation.eval(labels, predicted);



Any help Please? Thanks A lot


problem solved: Change the third parameter of RecordReaderDataSetIterator in DataSetIterator testIter = new RecordReaderDataSetIterator(rrTest, batchSize,0, 2); from 0 to 2; because the data set has three columns and the index of the class label is 2 because its the third column.


DataSetIterator trainIter = new RecordReaderDataSetIterator(rr, batchSize,2, 2);

refrences: enter link description here


I cannot run a simple program that I wrote to start understanding Deeplearning4j.

I tried the code from this link: Deep Learning In Java Using Deeplearning4J

unfortunately it didn't work for me. In fact I have this error:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See for further details. Exception in thread "main" java.lang.ExceptionInInitializerError at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertWritables( at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.convertFeaturesOrLabels( at org.deeplearning4j.datasets.datavec.RecordReaderMultiDataSetIterator.nextMultiDataSet( at at at at at com.alessio.text.App.main( Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: at org.nd4j.linalg.factory.Nd4j.initContext( at org.nd4j.linalg.factory.Nd4j.( ... 8 moreenter code here Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: at org.nd4j.linalg.factory.Nd4jBackend.load( at org.nd4j.linalg.factory.Nd4j.initContext( ... 9 more

I'll appreciate any advice. Thanks in advance


In addition to the comments above: You need an nd4j backend. Please take a look at the error message. The documentation for that is right in that link. Usually you want nd4j-native-platform and the latest version. For the latest versions of things, please use our examples repo:


Thanks in advance. I am using Word2Vec in DeepLearning4j.

How do I clear the vocab cache in Word2Vec. This is because I want it to retrain on a new set of word patterns every time I reload Word2Vec. For now, it seems that the vocabulary of the previous set of word patterns persists and I get the same result even though I changed my input training file.

I try to reset the model, but it doesn't work. Codes:-

Word2Vec vec = new Word2Vec.Builder() .minWordFrequency(1) .iterations(1) .layerSize(4) .seed(1) .windowSize(1) .iterate(iter) .tokenizerFactory(t) .resetModel(true) .limitVocabularySize(1) .build();

Anyone can help?


If you want to retrain (this is called training), I understand that you just want to completely ignore previous learned model (vocabulary, words vector, ...). To do that you should create another Word2Vec object and fit it with new data. You should use an other instance for SentenceIterator and Tokenizer classes so. Your problem could be the way you change your input training files.

It should be ok if you just change the SentenceIterator, i.e :

SentenceIterator iter = new CollectionSentenceIterator(DataFetcher.getFirstDataset());
Word2Vec vec = new Word2Vec.Builder()

vec.wordsNearest("clear", 10); // you will see results from first dataset

SentenceIterator iter2 = new CollectionSentenceIterator(DataFetcher.getSecondDataset());
vec =  new Word2Vec.Builder()

vec.wordsNearest("clear", 10); // you will see results from second dataset, without any first dataset implication

If you run the code twice and you changed your input data between executions (let's say A and then B) you shouldn't have the same results. If so that's mean your model learned the same thing with input data A and B.

If you want to update training (this is called inference), I mean use previous learned model and new data to update this model, then you should use this example from dl4j examples.


I am using a deeplearning4j library. Anybody knows how to feed the bag of words to feed forward neural network?


I got it.

        MultiLayerNetwork model = new MultiLayerNetwork(conf);model.init();model.setListeners(new ScoreIterationListener(100)); for(int i = 0; i < nEpochs; i++) { File("Pass here single file"), "label name"));