Hot questions for Using Neural networks in debugging

Question:

I am sorry, very newbee question... I trained a neural network with Theano and now I want to see what it outputs for a certain input.

So I can say:

test_pred = lasagne.layers.get_output(output_layer, dataset['X_test'])

where output_layer is my network. Now, the last layer happens to be a softmax, so if I say:

print "%s" % test_pred

I get

Softmax.0

I see why I get this I think (namely, because the output is a symbolic tensor variable), but I don't see how I can see the actual values.

And just so you know, I did read this post and also the documentation on printing and FAQ, which I am also not fully grasping, I am afraid...


Answer:

  1. Use .eval() to evaluate the symbolic expression
  2. Use Test Values

Question:

I am trying to build a CNN, I have 8 classes in the input_samples with 45 samples in each class. so total number of input samples are 360. I have divided my first 20 samples as train samples and remaining 25 samples as test samples in each class (My input is a text file and the data is in rows is my preprocessed data, so I am reading the rows in textfile and reshaping the images which are 16x12 size).

I am unable to fix the error in the code

My code:

import numpy as np
import random
import tensorflow as tf
folder = 'D:\\Lab_Project_Files\\TF\\Practice Files\\'
Datainfo = 'dataset_300.txt'
ClassInfo = 'classTrain.txt'

INPUT_WIDTH  = 16
IMAGE_HEIGHT = 12
IMAGE_DEPTH  = 1
IMAGE_PIXELS = INPUT_WIDTH * IMAGE_HEIGHT # 192 = 12*16
NUM_CLASSES  = 8
STEPS         = 500
STEP_VALIDATE = 100
BATCH_SIZE    = 5

def load_data(file1,file2,folder):
    filename1 = folder + file1
    filename2 = folder + file2
    # loading the data file
    x_data = np.loadtxt(filename1, unpack=True)
    x_data = np.transpose(x_data)
    # loading the class information of the data loaded
    y_data = np.loadtxt(filename2, unpack=True)
    y_data = np.transpose(y_data)
    # divide the data in to test and train data
    x_data_train = x_data[np.r_[0:20, 45:65, 90:110, 135:155, 180:200, 225:245, 270:290, 315:335],:]
    x_data_test  = x_data[np.r_[20:45, 65:90, 110:135, 155:180, 200:225, 245:270, 290:315, 335:360], :]
    y_data_train = y_data[np.r_[0:20, 45:65, 90:110, 135:155, 180:200, 225:245, 270:290,  315:335]]
    y_data_test  = y_data[np.r_[20:45, 65:90, 110:135, 155:180, 200:225, 245:270, 290:315, 335:360],:]
    return x_data_train,x_data_test,y_data_train,y_data_test

def reshapedata(data_train,data_test):
    data_train  = np.reshape(data_train, (len(data_train),INPUT_WIDTH,IMAGE_HEIGHT))
    data_test   = np.reshape(data_test,  (len(data_test), INPUT_WIDTH, IMAGE_HEIGHT))
    return data_train,data_test

def batchdata(data,label, batchsize):
    # generate random number required to batch data
    order_num = random.sample(range(1, len(data)), batchsize)
    data_batch = []
    label_batch = []
    for i in range(len(order_num)):
        data_batch.append(data[order_num[i-1]])
        label_batch.append(label[order_num[i-1]])
    return data_batch, label_batch

# CNN trail
def conv_net(x):
    weights = tf.Variable(tf.random_normal([INPUT_WIDTH * IMAGE_HEIGHT * IMAGE_DEPTH, NUM_CLASSES]))
    biases = tf.Variable(tf.random_normal([NUM_CLASSES]))
    out = tf.add(tf.matmul(x, weights), biases)
    return out

sess = tf.Session()
# get filelist and labels for training and testing
data_train,data_test,label_train,label_test =         load_data(Datainfo,ClassInfo,folder)
data_train, data_test, = reshapedata(data_train, data_test)

############################ get files for training ####################################################
image_batch, label_batch = batchdata(data_train,label_train,BATCH_SIZE)
# input output placeholders
x = tf.placeholder(tf.float32, [None, IMAGE_PIXELS])
y_ = tf.placeholder(tf.float32,[None, NUM_CLASSES])
# create the network
y = conv_net( x )
# loss
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
# train step
train_step   = tf.train.AdamOptimizer( 1e-3 ).minimize( cost )

############################## get files for validataion ###################################################
image_batch_test, label_batch_test = batchdata(data_test,label_test,BATCH_SIZE)

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

sess.run(tf.initialize_all_variables())

################ CNN Program ##############################

for i in range(STEPS):
        # checking the accuracy in between.
        if i % STEP_VALIDATE == 0:
            imgs, lbls = sess.run([image_batch_test, label_batch_test])
            print(sess.run(accuracy, feed_dict={x: imgs, y_: lbls}))

        imgs, lbls = sess.run([image_batch, label_batch])
        sess.run(train_step, feed_dict={x: imgs, y_: lbls})

imgs, lbls = sess.run([image_batch_test, label_batch_test])
print(sess.run(accuracy, feed_dict={ x: imgs, y_: lbls}))

file can be downloaded dataset_300.txt and ClassInfo.txt


Answer:

Session.run accepts only a list of tensors or tensor names.

imgs, lbls = sess.run([image_batch_test, label_batch_test])

In the previous line, you are passing image_batch_test and label_batch_test which are numpy arrays. I am not sure what you are trying to do by imgs, lbls = sess.run([image_batch_test, label_batch_test])

Question:

Building a GAN to generate images. The images have 3 color channels, 96 x 96.

The images that are generated by the generator at the beginning are all black, which is an issue given that is statistically highly unlikely.

Also, the loss for both networks is not improving.

I have posted the entire code below, and commented to allow it to be easily read. This is my first time building a GAN and I am new to Pytorch so any help is very appreciated!

Thanks.

import torch
from torch.optim import Adam
from torch.utils.data import DataLoader
from torch.autograd import Variable

import numpy as np
import os
import cv2
from collections import deque

# training params
batch_size = 100
epochs = 1000

# loss function
loss_fx = torch.nn.BCELoss()

# processing images
X = deque()
for img in os.listdir('pokemon_images'):
    if img.endswith('.png'):
        pokemon_image = cv2.imread(r'./pokemon_images/{}'.format(img))
        if pokemon_image.shape != (96, 96, 3):
            pass
        else:
            X.append(pokemon_image)

# data loader for processing in batches
data_loader = DataLoader(X, batch_size=batch_size)

# covert output vectors to images if flag is true, else input images to vectors
def images_to_vectors(data, reverse=False):
    if reverse:
        return data.view(data.size(0), 3, 96, 96)
    else:
        return data.view(data.size(0), 27648)

# Generator model
class Generator(torch.nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        n_features = 1000
        n_out = 27648

        self.model = torch.nn.Sequential(
                torch.nn.Linear(n_features, 128),
                torch.nn.ReLU(),
                torch.nn.Linear(128, 256),
                torch.nn.ReLU(),
                torch.nn.Linear(256, 512),
                torch.nn.ReLU(),
                torch.nn.Linear(512, 1024),
                torch.nn.ReLU(),
                torch.nn.Linear(1024, n_out),
                torch.nn.Tanh()
        )


    def forward(self, x):
        img = self.model(x)
        return img

    def noise(self, s):
       x = Variable(torch.randn(s, 1000))
       return x


# Discriminator model
class Discriminator(torch.nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        n_features = 27648
        n_out = 1

        self.model = torch.nn.Sequential(
                torch.nn.Linear(n_features, 512),
                torch.nn.ReLU(),
                torch.nn.Linear(512, 256),
                torch.nn.ReLU(),
                torch.nn.Linear(256, n_out),
                torch.nn.Sigmoid()
        )


    def forward(self, img):
        output = self.model(img)
        return output


# discriminator training
def train_discriminator(discriminator, optimizer, real_data, fake_data):
    N = real_data.size(0)
    optimizer.zero_grad()

    # train on real
    # get prediction
    pred_real = discriminator(real_data)

    # calculate loss
    error_real = loss_fx(pred_real, Variable(torch.ones(N, 1)))

    # calculate gradients
    error_real.backward()

    # train on fake
    # get prediction
    pred_fake = discriminator(fake_data)

    # calculate loss
    error_fake = loss_fx(pred_fake, Variable(torch.ones(N, 0)))

    # calculate gradients
    error_fake.backward()

    # update weights
    optimizer.step()

    return error_real + error_fake, pred_real, pred_fake


# generator training
def train_generator(generator, optimizer, fake_data):
    N = fake_data.size(0)

    # zero gradients
    optimizer.zero_grad()

    # get prediction
    pred = discriminator(generator(fake_data))

    # get loss
    error = loss_fx(pred, Variable(torch.ones(N, 0)))

    # compute gradients
    error.backward()

    # update weights
    optimizer.step()

    return error


# Instance of generator and discriminator
generator = Generator()
discriminator = Discriminator()

# optimizers
g_optimizer = torch.optim.Adam(generator.parameters(), lr=0.001)
d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=0.001)

# training loop
for epoch in range(epochs):
     for n_batch, batch in enumerate(data_loader, 0):
         N = batch.size(0)

         # Train Discriminator

         # REAL
         real_images = Variable(images_to_vectors(batch)).float()

         # FAKE
         fake_images = generator(generator.noise(N)).detach()

         # TRAIN
         d_error, d_pred_real, d_pred_fake = train_discriminator(
                 discriminator,
                 d_optimizer,
                 real_images,
                 fake_images
         )

         # Train Generator

         # generate noise
         fake_data = generator.noise(N)

         # get error based on discriminator
         g_error = train_generator(generator, g_optimizer, fake_data)

         # convert generator output to image and preprocess to show
         test_img = np.array(images_to_vectors(generator(fake_data), reverse=True).detach())
         test_img = test_img[0, :, :, :]
         test_img = test_img[..., ::-1]

         # show example of generated image
         cv2.imshow('GENERATED', test_img[0])
         if cv2.waitKey(1) & 0xFF == ord('q'):
             break

     print('EPOCH: {0}, D error: {1}, G error: {2}'.format(epoch, d_error, g_error))


cv2.destroyAllWindows()

# save weights
# torch.save('weights.pth')

Answer:

One can't really easily debug your training without the data and so on, but a possible problem is that your generator's last layer is a Tanh(), which means output values between -1 and 1. You probably want:

  1. To have your real images normalized to the same range, e.g. in train_discriminator():

    # train on real
    pred_real = discriminator(real_data * 2. - 1.) # supposing real_data in [0, 1]
    
  2. To re-normalize your generated data to [0, 1] before visualization/use.

    # convert generator output to image and preprocess to show
    test_img = np.array(
        images_to_vectors(generator(fake_data), reverse=True).detach())
    test_img = test_img[0, :, :, :]
    test_img = test_img[..., ::-1]
    test_img = (test_img + 1.) / 2.
    

Question:

What are some useful ways to debug NEURON simulator .MOD files? In other languages one can usually use print() statements to see the variable values. Is there something like a print() statement in .MOD files?


Answer:

Use printf() statements

For example, in any of the sections within a .MOD file, adding the printf() statement below will print the variable t, i, and v values every time that section is evaluated during the simulation:

BREAKPOINT {
    SOLVE state METHOD cnexp
    g = (B - A)*gmax
    i = g*(v - e)

    printf("time: %g, current: %g, voltage: %g \n", t, i, v)
}

Will result in something that looks like this:

time: 231.062, current: 0.000609815, voltage: -67.2939 
time: 231.188, current: 0.000609059, voltage: -67.2938 
time: 231.312, current: 0.000608304, voltage: -67.2937 
time: 231.438, current: 0.00060755, voltage: -67.2936 
time: 231.562, current: 0.000606844, voltage: -67.2924 

Notes:

  • Recompile the .mod files in the folder after adding the above statements
  • Don't forget to include the '\n' at the end to avoid piling up the output
  • Other parameter options (besides %g) can be found in the printf() reference

Question:

I've been working on an assignment for a Machine Learning course. I am new to java and have been using Eclipse. The logic and learning algorithm is not what I am looking for help on.

But specifically, I have a while loop in main() that is supposed to output a variable named totError. totError should be different every time it loops (calculated based on a changing parameter). However, I can't seem to find where I've gone wrong with the code, it keeps on displaying the same. Am I using static variables and methods wrong??

The .java and .txt are pasted below (unfortunately, the .txt is too large so I've just included a small part of it but the dimension of my arrays are correct). It is quite a bit of material, would really appreciate if anyone can point me in the correct direction.

Thank you!

package nn;

import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
//import java.io.PrintStream;
import java.io.IOException;
import java.lang.Math; 

public class learningBP {
	
	// Declare variables
	static int NUM_INPUTS = 10;		// Including 1 bias
	static int NUM_HIDDENS = 10;	// Including 1 bias
	static int NUM_OUTPUTS = 1;
	static double LEARNING_RATE = 0.1;
	static double MOMENTUM = 0.1;
	static double TOT_ERROR_THRESHOLD = 0.05;
	static double SIGMOID_UB = 1;
	static double SIGMOID_LB = -1;
	
	static double [][] wgtIH = new double[NUM_INPUTS][NUM_HIDDENS];
	static double [][] dltWgtIH = new double[NUM_INPUTS][NUM_HIDDENS];
	static double [][] wgtHO = new double[NUM_HIDDENS][NUM_OUTPUTS];
	static double [][] dltWgtHO = new double[NUM_HIDDENS][NUM_OUTPUTS];
	
	static int NUM_STATES_ACTIONS; 
	
	static String [][] strLUT = new String[4*4*4*3*4*4][2];
	static double [][] arrayLUT = new double[strLUT.length][2];
	static double [][] arrayNormLUT = new double[strLUT.length][2];
	static double [] arrayErrors = new double[strLUT.length];
	static double [] arrayOutputs = new double[strLUT.length];
	static double [] arrayNormOutputs = new double[strLUT.length];
	static double [][] valueInput = new double[strLUT.length][NUM_INPUTS];
	static double [][] valueHidden = new double[strLUT.length][NUM_HIDDENS];
	static double [] dltOutputs = new double[strLUT.length];
	static double [][] dltHiddens = new double[strLUT.length][NUM_HIDDENS];
	
	static double totError = 1;
	static int numEpochs = 0;

	public static void main(String[] args) {
		
		// Load LUT
		String fileName = "/Users/XXXXX/Desktop/LUT.txt";
		try {
			load(fileName);
		}
		catch (IOException e) {
			e.printStackTrace();
		}
		
		// Initialize NN Weights
		initializeWeights();
		
		while (totError > TOT_ERROR_THRESHOLD) {
			
			// Feed Forward
			fwdFeed();
			
			// Back Propagation
			bckPropagation();
			
			// Calculate Total Error
			totError = calcTotError(arrayErrors);
			numEpochs += 1;
		
			System.out.println("Number of Epochs: "+numEpochs);
			System.out.println(totError);
			
		}

	}


	public double outputFor(double[] X) {
		// TODO Auto-generated method stub
		return 0;
	}

	public double train(double[] X, double argValue) {
		// TODO Auto-generated method stub
		return 0;
	}

	public void save(File argFile) {
		// TODO Auto-generated method stub
		
	}

	public static void load(String argFileName) throws IOException {
		
		// Load LUT training set from Part2
        BufferedReader r = new BufferedReader(new FileReader(new File(argFileName)));
        String l = r.readLine();
        try {
	        int a = 0;
	        while (l != null) {
	            String spt[] = l.split("	");
	            strLUT[a][0] = spt[0]; 
	            strLUT[a][1] = spt[1];
	            arrayLUT[a][0] = Double.parseDouble(strLUT[a][0]);
	            arrayLUT[a][1] = Double.parseDouble(strLUT[a][1]);
	            a += 1;
	            l = r.readLine();
	        }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
        	r.close();
        }
		
        // Normalize LUT to bipolar
        for (int b = 0; b < arrayLUT.length; b++) {
        	arrayNormLUT[b][0] = arrayLUT[b][0];
        	arrayNormLUT[b][1] = sigmoid(arrayLUT[b][1]);
        }
        
	}

	public static double sigmoid(double x) {
		
		// Bipolar sigmoid
		return (SIGMOID_UB - SIGMOID_LB) / (1 + Math.pow(Math.E, -x)) + SIGMOID_LB;
		
	}

	public static void initializeWeights() {
		
		// Initialize weights from input layer to hidden layer
		for (int i = 0; i < NUM_INPUTS; i++) {
			for (int j = 0; j < NUM_HIDDENS; j++) {
				wgtIH[i][j] = Math.random() - 0.5;
				dltWgtIH[i][j] = 0;
			}
		}
		
		// Initialize weights from hidden layer to output layer
		for (int j = 0; j < NUM_HIDDENS; j++) {
			for (int k = 0; k < NUM_OUTPUTS; k++) {
				wgtHO[j][k] = Math.random() - 0.5;
				dltWgtHO[j][k] = 0;
			}
		}
		
	}

	public void zeroWeights() {

		// TODO Auto-generated method stub
		
	}

	public static void fwdFeed() {
		
		for(int z = 0; z < arrayLUT.length; z++) { 
			
			// Normalize between [-1, 1]
			valueInput[z][0] = (Character.getNumericValue(strLUT[z][0].charAt(0)) - 2.5)/1.5; // myX
			valueInput[z][1] = (Character.getNumericValue(strLUT[z][0].charAt(1)) - 2.5)/1.5; // myY
			valueInput[z][2] = (Character.getNumericValue(strLUT[z][0].charAt(2)) - 2.5)/1.5; // myHead
			valueInput[z][3] = Character.getNumericValue(strLUT[z][0].charAt(3)) - 2; // enProx
			valueInput[z][4] = (Character.getNumericValue(strLUT[z][0].charAt(4)) - 2.5)/1.5; // enAngle
			
			// Vectorization of the four possible actions into binaries
			valueInput[z][5] = 0;
			valueInput[z][6] = 0;
			valueInput[z][7] = 0;
			valueInput[z][8] = 0;
		
			int action = Character.getNumericValue(strLUT[z][0].charAt(5)); // action
			valueInput[z][action-1] = 1;
			
			// Apply bias input
			valueInput[z][9] = 1;
			
			// Calculate value for hidden neuron j
			for(int j = 0; j < NUM_HIDDENS-1; j++) {
				valueHidden[z][j] = 0;
				for(int i = 0; i < NUM_INPUTS; i++) {
					valueHidden[z][j] += valueInput[z][i]*wgtIH[i][j];
				}
				valueHidden[z][j] = sigmoid(valueHidden[z][j]);
			}
			
			// Apply bias hidden neuron
			valueHidden[z][9] = 1;
			
			// Calculate value for output neuron
			arrayOutputs[z] = 0;
			for(int j = 0; j < NUM_HIDDENS; j++) {
				arrayOutputs[z] += valueHidden[z][j]*wgtHO[j][0];
			}
			
			arrayNormOutputs[z] = sigmoid(arrayOutputs[z]);		
			arrayErrors[z] = arrayNormOutputs[z] - arrayNormLUT[z][1];
		}
		
	}
	
	public static void bckPropagation() {
		
		for(int z = 0; z < arrayLUT.length; z++) { 
			
			// Delta rule for bipolar sigmoids
			dltOutputs[z] = arrayErrors[z] * (1/2) * (1 + arrayNormLUT[z][1]) * (1 - arrayNormLUT[z][1]);
			
			// Calculate update weights between hidden & output layers
			for(int j = 0; j < NUM_HIDDENS; j++) {
				
				dltWgtHO[j][0] = (LEARNING_RATE * dltOutputs[z] * valueHidden[z][j]) + (MOMENTUM * dltWgtHO[j][0]);
				wgtHO[j][0] += dltWgtHO[j][0];
				
			}	
			
			// Delta rule for bipolar sigmoids
			for(int j = 0; j < NUM_HIDDENS-1; j++) {
				
				dltHiddens[z][j] = (dltOutputs[z] * wgtHO[j][0]) * (1/2) * (1 + valueHidden[z][j]) * (1 - valueHidden[z][j]);
			
				// calculate update weights between input & hidden layers
				for(int i = 0; i < NUM_INPUTS; i++){
					
					dltWgtIH[i][j] = (LEARNING_RATE * dltHiddens[z][j] * valueInput[z][i]) + (MOMENTUM * dltWgtIH[i][j]);
					wgtIH[i][j] += dltWgtIH[i][j];
					
				}
				
			}
			
		}
	}
	
	public static double calcTotError(double [] Ar) {
		
		// Get total error
		double outputTotError = 0;
		for(int z = 0; z < Ar.length; z++) {
			
			outputTotError += Math.pow(Ar[z], 2);
			
		}
		return outputTotError /= 2;
		
	}
	
}

Answer:

I found the following technical and algorithmic/mathematical flaws:

Technical issue:

Replace (1/2) with 0.5 since (1/2) results in 0 (in Java the divisor or the dividend or both have to be a double, so that the result is a double, otherwise it's an int). There are two occurrences in bckPropagation().

Mathematical issue 1:

Considering the Delta Rule (e.g. http://users.pja.edu.pl/~msyd/wyk-nai/multiLayerNN-en.pdf) and the Delta Rule with Momentum (e.g. http://ecee.colorado.edu/~ecen4831/lectures/deltasum.html) there seems to be a sign error concerning dltOutputs[z]. Replace in bckPropagation()

dltOutputs[z] = arrayErrors[z] * (1/2) * (1 + arrayNormLUT[z][1]) * (1 - arrayNormLUT[z][1]);

with

dltOutputs[z] = -arrayErrors[z] * 0.5 * (1 + arrayNormLUT[z][1]) * (1 - arrayNormLUT[z][1]);

Mathematical issue 2 (here I'm not really sure, but I think it's a mistake): The weights for a testcase z may only depend on data from testcase z for the current epoch and all previous epochs (due to the while-loop). Currently, in bckPropagation() the weights of testcase z additionally contain the weights of all previous testcases z' < z (due to the for-loop) for the current epoch and all previous epochs (again due to the while-loop). A possible solution is the introduction of z as 3. dimension for the weights: wgtIH[z][i][j] and wgtHO[z][j][0]. Now the contributions to the weights are for each testcase z isolated from each other testcase z'. To consider this, the following modifications are necessary:

1) Defining:

static double [][][] wgtIH = new double[strLUT.length][NUM_INPUTS][NUM_HIDDENS];
static double [][][] wgtHO = new double[strLUT.length][NUM_HIDDENS][NUM_OUTPUTS];

2) Initialization:

public static void initializeWeights() {
    for(int z = 0; z < arrayLUT.length; z++) { 
        // Initialize weights from input layer to hidden layer
        double rndWgtIH = Math.random() - 0.5;
        for (int i = 0; i < NUM_INPUTS; i++) {
            for (int j = 0; j < NUM_HIDDENS; j++) {
                wgtIH[z][i][j] = rndWgtIH;    
                dltWgtIH[i][j] = 0;
            }
        }
        // Initialize weights from hidden layer to output layer
        double rndWgtHO = Math.random() - 0.5;
        for (int j = 0; j < NUM_HIDDENS; j++) {
            for (int k = 0; k < NUM_OUTPUTS; k++) {
                wgtHO[z][j][k] = rndWgtHO;
                dltWgtHO[j][k] = 0;
            }
        }
    }
}

3) fwdFeed()- and bckPropagation()-method:

In both methods wgtIH[i][j] and wgtHO[j][k] have to pe replaced with wgtIH[z][i][j] and wgtHO[z][j][k], respectively.

Example: Development of the total error as a function of the epochs

 LEARNING_RATE = 0.4, MOMENTUM = 0.4, TOT_ERROR_THRESHOLD = 1;

 Number of Epochs: 1
 178.54336668545102
 Number of Epochs: 10000
 15.159692746944888
 Number of Epochs: 20000
 10.653887138186896
 Number of Epochs: 30000
 8.669183516487523
 Number of Epochs: 40000
 7.504963842773336
 Number of Epochs: 50000
 6.723327476195474
 Number of Epochs: 60000
 6.153237046947662
 Number of Epochs: 70000
 5.7133602902880325
 Number of Epochs: 80000
 5.360053126719502
 Number of Epochs: 90000
 5.06774284345891
 Number of Epochs: 100000
 4.820373442353342
 Number of Epochs: 200000
 3.4647965464740746
 Number of Epochs: 300000
 2.8350276017589153
 Number of Epochs: 400000
 2.4398876881673557
 Number of Epochs: 500000
 2.158533606426507
 Number of Epochs: 600000
 1.9432229058177424
 Number of Epochs: 700000
 1.770444540122524
 Number of Epochs: 800000
 1.627115257304848
 Number of Epochs: 900000
 1.5053344819279666
 Number of Epochs: 1000000
 1.4000233082047084
 Number of Epochs: 1100000
 1.3077427523972092
 Number of Epochs: 1200000
 1.2260577251537967
 Number of Epochs: 1300000
 1.153175740062673
 Number of Epochs: 1400000
 1.0877325511159377
 Number of Epochs: 1500000
 1.0286600703077815
 Duration: 822.8203290160001s -> approx. 14min

As expected the total error decreases from epoch to epoch because of the learning progress of the neural network.

Question:

Normally machine learning systems perform well. However when there is a problem with the trained machine learning system (for example the machine learning system performs worse than random ...) this great "guessing game" begins. With "guessing game", I allude to my experience. For me it seems, that debugging machine learning systems is most often done by guessing the problem rather than in a methodological way.

And since there are numerous reasons why a machine learning system may fail, finding the actual bug can be pretty time consuming. For example the bug may be due to:

  • biased training dataset
  • insufficient training data
  • datasets containing errors
  • unrepresentative/too many features
  • sloppy training (for example in neuronal-networks, when the training data is not presented randomly)
  • ...

Is there a machine learning system that is easy to debug? (And how can it be debugged?)

Is there a known methodical way of debugging machine learning systems at all?


Answer:

What you refer to as "debugging" is known as optimizing in the machine learning community. While there are certain ways to optimize a classifier depending on the classifier and the problem, there is no standard way for this. For example, in a text classification problem you might find out through experiments that if you train your classifier with certain features, the performance of your classifier would be enhanced. There are methods for selecting feature combinations that would result in highest classification accuracy of a classifier. Some of these methods involve using a genetic algorithm to find the best feature combinations. One method that you can learn about is sequential feature selection. There are also many papers on such topics that you might find useful. Additionally, there are studies that change the classification function or other computations in a classifier implementation to achieve better classification results.

Having said that, there are also some ways to optimize a classifier that are considered as cheating and should be avoided (which is usually when a classifier is optimized to only solve a problem only on a single dataset or highly similar datasets and not on other previously unseen datasets).