Hot questions for Using Neural networks in scala

Question:

So I am using tensorboard within keras. In tensorflow one could use two different summarywriters for train and validation scalars so that tensorboard could plot them in a same figure. Something like the figure in

TensorBoard - Plot training and validation losses on the same graph?

Is there a way to do this in keras?

Thanks.


Answer:

To handle the validation logs with a separate writer, you can write a custom callback that wraps around the original TensorBoard methods.

import os
import tensorflow as tf
from keras.callbacks import TensorBoard

class TrainValTensorBoard(TensorBoard):
    def __init__(self, log_dir='./logs', **kwargs):
        # Make the original `TensorBoard` log to a subdirectory 'training'
        training_log_dir = os.path.join(log_dir, 'training')
        super(TrainValTensorBoard, self).__init__(training_log_dir, **kwargs)

        # Log the validation metrics to a separate subdirectory
        self.val_log_dir = os.path.join(log_dir, 'validation')

    def set_model(self, model):
        # Setup writer for validation metrics
        self.val_writer = tf.summary.FileWriter(self.val_log_dir)
        super(TrainValTensorBoard, self).set_model(model)

    def on_epoch_end(self, epoch, logs=None):
        # Pop the validation logs and handle them separately with
        # `self.val_writer`. Also rename the keys so that they can
        # be plotted on the same figure with the training metrics
        logs = logs or {}
        val_logs = {k.replace('val_', ''): v for k, v in logs.items() if k.startswith('val_')}
        for name, value in val_logs.items():
            summary = tf.Summary()
            summary_value = summary.value.add()
            summary_value.simple_value = value.item()
            summary_value.tag = name
            self.val_writer.add_summary(summary, epoch)
        self.val_writer.flush()

        # Pass the remaining logs to `TensorBoard.on_epoch_end`
        logs = {k: v for k, v in logs.items() if not k.startswith('val_')}
        super(TrainValTensorBoard, self).on_epoch_end(epoch, logs)

    def on_train_end(self, logs=None):
        super(TrainValTensorBoard, self).on_train_end(logs)
        self.val_writer.close()
  • In __init__, two subdirectories are set up for training and validation logs
  • In set_model, a writer self.val_writer is created for the validation logs
  • In on_epoch_end, the validation logs are separated from the training logs and written to file with self.val_writer

Using the MNIST dataset as an example:

from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10,
          validation_data=(x_test, y_test),
          callbacks=[TrainValTensorBoard(write_graph=False)])

You can then visualize the two curves on a same figure in TensorBoard.


EDIT: I've modified the class a bit so that it can be used with eager execution.

The biggest change is that I use tf.keras in the following code. It seems that the TensorBoard callback in standalone Keras does not support eager mode yet.

import os
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.python.eager import context

class TrainValTensorBoard(TensorBoard):
    def __init__(self, log_dir='./logs', **kwargs):
        self.val_log_dir = os.path.join(log_dir, 'validation')
        training_log_dir = os.path.join(log_dir, 'training')
        super(TrainValTensorBoard, self).__init__(training_log_dir, **kwargs)

    def set_model(self, model):
        if context.executing_eagerly():
            self.val_writer = tf.contrib.summary.create_file_writer(self.val_log_dir)
        else:
            self.val_writer = tf.summary.FileWriter(self.val_log_dir)
        super(TrainValTensorBoard, self).set_model(model)

    def _write_custom_summaries(self, step, logs=None):
        logs = logs or {}
        val_logs = {k.replace('val_', ''): v for k, v in logs.items() if 'val_' in k}
        if context.executing_eagerly():
            with self.val_writer.as_default(), tf.contrib.summary.always_record_summaries():
                for name, value in val_logs.items():
                    tf.contrib.summary.scalar(name, value.item(), step=step)
        else:
            for name, value in val_logs.items():
                summary = tf.Summary()
                summary_value = summary.value.add()
                summary_value.simple_value = value.item()
                summary_value.tag = name
                self.val_writer.add_summary(summary, step)
        self.val_writer.flush()

        logs = {k: v for k, v in logs.items() if not 'val_' in k}
        super(TrainValTensorBoard, self)._write_custom_summaries(step, logs)

    def on_train_end(self, logs=None):
        super(TrainValTensorBoard, self).on_train_end(logs)
        self.val_writer.close()

The idea is the same --

  • Check the source code of TensorBoard callback
  • See what it does to set up the writer
  • Do the same thing in this custom callback

Again, you can use the MNIST data to test it,

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.train import AdamOptimizer

tf.enable_eager_execution()

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = y_train.astype(int)
y_test = y_test.astype(int)

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=AdamOptimizer(), metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10,
          validation_data=(x_test, y_test),
          callbacks=[TrainValTensorBoard(write_graph=False)])

Question:

I'm trying to train a classifier via PyTorch. However, I am experiencing problems with training when I feed the model with training data. I get this error on y_pred = model(X_trainTensor):

RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'

Here are key parts of my code:

# Hyper-parameters 
D_in = 47  # there are 47 parameters I investigate
H = 33
D_out = 2  # output should be either 1 or 0
# Format and load the data
y = np.array( df['target'] )
X = np.array( df.drop(columns = ['target'], axis = 1) )
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)  # split training/test data

X_trainTensor = torch.from_numpy(X_train) # convert to tensors
y_trainTensor = torch.from_numpy(y_train)
X_testTensor = torch.from_numpy(X_test)
y_testTensor = torch.from_numpy(y_test)
# Define the model
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
    nn.LogSoftmax(dim = 1)
)
# Define the loss function
loss_fn = torch.nn.NLLLoss() 
for i in range(50):
    y_pred = model(X_trainTensor)
    loss = loss_fn(y_pred, y_trainTensor)
    model.zero_grad()
    loss.backward()
    with torch.no_grad():       
        for param in model.parameters():
            param -= learning_rate * param.grad

Answer:

Reference is from this github issue.

When the error is RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1', you would need to use the .float() function since it says Expected object of scalar type Float.

Therefore, the solution is changing y_pred = model(X_trainTensor) to y_pred = model(X_trainTensor.float()).

Likewise, when you get another error for loss = loss_fn(y_pred, y_trainTensor), you need y_trainTensor.long() since the error message says Expected object of scalar type Long.

You could also do model.double(), as suggested by @Paddy .

Question:

import torch.nn as nn 
import torch 
import torch.optim as optim
import itertools

class net1(nn.Module):
    def __init__(self):
        super(net1,self).__init__()

        self.pipe = nn.Sequential(
            nn.Linear(10,10),
            nn.ReLU()
        )

    def forward(self,x):
        return self.pipe(x.long())

class net2(nn.Module):
    def __init__(self):
        super(net2,self).__init__()

        self.pipe = nn.Sequential(
            nn.Linear(10,20),
            nn.ReLU(),
            nn.Linear(20,10)
        )

    def forward(self,x):
        return self.pipe(x.long())



netFIRST = net1()
netSECOND = net2()

learning_rate = 0.001

opt = optim.Adam(itertools.chain(netFIRST.parameters(),netSECOND.parameters()), lr=learning_rate)

epochs = 1000

x = torch.tensor([1,2,3,4,5,6,7,8,9,10],dtype=torch.long)
y = torch.tensor([10,9,8,7,6,5,4,3,2,1],dtype=torch.long)


for epoch in range(epochs):
    opt.zero_grad()

    prediction = netSECOND(netFIRST(x))
    loss = (y.long() - prediction)**2
    loss.backward()

    print(loss)
    print(prediction)
    opt.step()

error:

line 49, in prediction = netSECOND(netFIRST(x))

line 1371, in linear; output = input.matmul(weight.t())

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'mat2'

I don't really see what I'm doing wrong. I have tried to turn everything in a Long in every possible way. I don't really get the way typing works for pytorch. Last time I tried something with just one layer and it forced me to use type int. Could someone explain how the typing is established in pytorch and how to prevent and fix errors like this?? A lot I mean an awful lot of thanks in advance, this problem really bothers me and I can't seem to fix it no matter what I try.


Answer:

The weights are Floats, the inputs are Longs. This is not allowed. In fact, I don't think torch supports anything else than Floats in neural networks.

If you remove all calls to long, and define your input as floats, it will work (it does, I tried).

(You will then get another unrelated error: you need to sum your loss)

Question:

Below is my implementation of a neural network with 1 input layer, two hidden layers and 1 output layer :

import breeze.linalg._
import breeze.math._
import breeze.numerics._

object NN extends App {

  //Forward propogation
  val x1 = DenseVector(1.0, 0.0, 1.0)
  val y1 = DenseVector(1.0, 1.0, 1.0)

  val theta1 = DenseMatrix((1.0, 1.0, 1.0), (1.0, 1.0, 0.0), (1.0, 0.0, 0.0));
  val theta2 = DenseMatrix((1.0, 1.0, 1.0), (1.0, 1.0, 0.0), (1.0, 0.0, 0.0));
  val theta3 = DenseMatrix((1.0, 1.0, 1.0), (1.0, 1.0, 0.0), (1.0, 0.0, 0.0));

  val a1 = x1;

  val z2 = theta1 * a1;
  val a2 = (z2.map { x => 1 + sigmoid(x) })

  val z3 = theta2 * a2;
  val a3 = (z3.map { x => 1 + sigmoid(x) })

  val z4 = theta3 * a3;
  val a4 = (z4.map { x => 1 + sigmoid(x) })

  //Back propagation
  val errorLayer4 = a4 - DenseVector(1.0, 1.0, 1.0)
  val errorLayer3 = (theta3.t * errorLayer4) :* (a3 :* (DenseVector(1.0, 1.0, 1.0) - a3))
  val errorLayer2 = (theta2.t * errorLayer3) :* (a2 :* (DenseVector(1.0, 1.0, 1.0) - a2))

  //Compute delta values
  val delta1 = errorLayer2 * a2.t
  val delta2 = errorLayer3 * a3.t
  val delta3 = errorLayer4 * a4.t


  //Gradient descent
  val m = 1
  val alpha = .0001
  val x = DenseVector(1.0, 0.0, 1.0)
  val y = DenseVector(1.0, 1.0, 1.0)

  val pz1 = delta1 - (alpha / m) * (x.t * (delta1 * x - y))
  val p1z1 = (sigmoid(delta1 * x)) + 1.0 
  println(p1z1);

  val pz2 = delta2 - (alpha / m) * (x.t * (delta2 * x - y))
  val p1z2 = (sigmoid(delta2 * p1z1)) + 1.0
  println(p1z2);

  val pz3 = delta3 - (alpha / m) * (x.t * (delta3 * x - y))
  val p1z3 = (sigmoid(delta3 * p1z2)) + 1.0
  println(p1z3);


}

The output of this network is :

Jun 03, 2016 7:47:50 PM com.github.fommil.netlib.BLAS <clinit>
WARNING: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
Jun 03, 2016 7:47:50 PM com.github.fommil.jni.JniLoader liberalLoad
INFO: successfully loaded C:\Users\Local\Temp\jniloader3606930058943197684netlib-native_ref-win-x86_64.dll
DenseVector(2.0, 2.0, 1.9999999999946196)
DenseVector(1.0, 1.0, 1.0000000064265646)
DenseVector(1.9971047766732295, 1.9968279599465841, 1.9942769808711798)

I'm using a single training example 101 and output value of 111. The predicted value given 1,0,1 is 1.9,1.9,1.9 when the predicted value should be 1,1,1 .

I think how I'm computing the sigmoid with bias is incorrect, should the bias +1 value be added after sigmoid calculation for layer , in other words use { x => sigmoid(x+1) } instead of { x => 1 + sigmoid(x) } ?


Answer:

A perceptron-style neuron's output is sigmoid(sum(xi * wi)) where the bias input x0 is 1, but the weight is not necessarily 1. You definitely don't sum the 1 outside the sigmoid, but you also don't sum it inside. You need a weight. So it should be equivalent to

sigmoid(w0 + w1*x1 + w2*x2 + ...)

Question:

please consider the following code:

  #update
  W1 = W1 - learningRate * dJdW1
  W2 = W2 - learningRate * dJdW2

Where learningRate is double and dJdW1/dJdW1 2d matrices.

I'm getting this error:

ERROR: Runtime error in program block generated from statement block between lines 58 and 61 -- Error evaluating instruction: CP\xb0-*\xb0W2\xb7MATRIX\xb7DOUBLE\xb01.0E-5\xb7SCALAR\xb7DOUBLE\xb7true\xb0dJdW2\xb7MATRIX\xb7DOUBLE\xb0_mVar117\xb7MATRIX\xb7DOUBLE

EDIT 12.7.17:

plus this one...

ordinal not in range(128)'))

The whole DML can be found here

The complete Error can be found here

The whole jupyther notebook can be found here


Answer:

The cellwise scalar matrix operation is fine. Looking at your error, it says that your matrix/vector dimensions are not compatible:

 : Block sizes are not matched for binary cell operations: 3x1 vs 2x3
 org.apache.sysml.runtime.matrix.data.MatrixBlock.binaryOperations(MatrixBlock.java:2872)
 org.apache.sysml.runtime.instructions.cp.PlusMultCPInstruction.processInstruction(PlusMultCPInstruction.java:66)
 org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290)

Looking at your Notebook, this comes from:

 W2 = W2 - learningRate * dJdW2

W2 is initialized W2 = rand(rows=hiddenLayerSize,cols=outputLayerSize) as a 3x1 matrix, while dJDW2 is a 2x3 matrix.

Question:

The flow should be:

Input -> Word2Vectors -> Output -> NeuralNetwork

I have tried word2vec function of spark but I am confused with the format "MultilayerPerceptronClassifier" need as a input?


Answer:

When you define your MultilayerPerceptronClassifier you have to give as parameter an Array[Int] called layers. These describe the number of neurons per layer in that sequence. The first layer's input dimension must match the length of the Word2Vec output dimension. So you should set the parameter to

val layers = Array[Int](featureDim, 5, 4, 5, ...)

And replace the numbers with the parameters you want your model to have. You should set featureDim to the length of the vectors your Word2VecModel produces. Unfortunately, the attribute with that value is hidden via a private accessor and there is no getter method implemented as of now.

Question:

When computing the delta values for a neural network after running back propagation :

the value of delta(1) will be a scalar value, it should be a vector ?

Update :

Taken from http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

Specifically :


Answer:

First, you probably understand that in each layer, we have n x m parameters (or weights) that needs to be learned so it forms a 2-d matrix.

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

We have n x m parameters because there is one connection between any of the two nodes between the previous and the current layer.

I am pretty sure that Delta (big delta) at layer L is used to accumulate partial derivative terms for every parameter at layer L. So you have a 2D matrix of Delta at each layer as well. To update the i-th row (the i-th node in the current layer) and j-th column (the j-th node in the previous layer) of the matrix,

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

Thus to answer your question, Delta should be a matrix.