Hot questions for Using Neural networks in pca

Question:

Suppose there are 8 features in the dataset. I use PCA to and find out that 99% of the information is in the first 3 features using the cumulative sum of the explained variance ratio. Then why do I need to fit and transform these 3 features using PCA in order to use them for training my neural network ? Why cant I just use the three features as is ?


Answer:

The reason is that when PCA tells you that 99% of the variance is explained by the first three components, it doesn't mean that it is explained by the first three features. PCA components are linear combinations of the features, but they are usually not the features themselves. For example, PCA components must be orthogonal to each other, while the features don't have to be.

Question:

I would like the mathematical proof of it. does anyone know a paper for it. or can workout the math?


Answer:

https://pvirie.wordpress.com/2016/03/29/linear-autoencoders-do-pca/ PCA is restricted to a linear map, while auto encoders can have nonlinear enoder/decoders.

A single layer auto encoder with linear transfer function is nearly equivalent to PCA, where nearly means that the WW found by AE and PCA won't be the same--but the subspace spanned by the respective WW's will.

Question:

I'm trying to train the mnist database with the neural network after applying PCA. and I keep getting errors because of the data shape after applying the PCA. I'm not sure how to fit everything together. and how to go through the whole database, not just a small patch.

here is my code:

    <pre> <code>

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random
from sklearn.preprocessing import StandardScaler
from tensorflow.examples.tutorials.mnist import input_data
from sklearn.decomposition import PCA

datadir='/data' 
data= input_data.read_data_sets(datadir, one_hot=True)
train_x = data.train.images[:55000]
train_y= data.train.labels[:55000]
test_x = data.test.images[:10000]
test_y = data.test.labels[:10000]
print("original shape:   ", data.train.images.shape)

percent=600
pca=PCA(percent)
train_x=pca.fit_transform(train_x)
test_x=pca.fit_transform(test_x)
print("transformed shape:", data.train.images.shape)
train_x=pca.inverse_transform(train_x)
test_x=pca.inverse_transform(test_x)
c=pca.n_components_

plt.figure(figsize=(8,4));
plt.subplot(1, 2, 1);
image=np.reshape(data.train.images[3],[28,28])
plt.imshow(image, cmap='Greys_r')
plt.title("Original Data")

plt.subplot(1, 2, 2);
image1=train_x[3].reshape(28,28)
image.shape
plt.imshow(image1, cmap='Greys_r')
plt.title("Original Data after 0.8 PCA")

plt.figure(figsize=(10,8))
plt.plot(range(c), np.cumsum(pca.explained_variance_ratio_))
plt.grid()
plt.title("Cumulative Explained Variance")
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance');


num_iters=10
hidden_1=1024
hidden_2=1024
input_l=percent
out_l=10
'''input layer'''
x=tf.placeholder(tf.float32, [None, 28,28,1])
x=tf.reshape(x,[-1, input_l])

w1=tf.Variable(tf.random_normal([input_l,hidden_1])) 
w2=tf.Variable(tf.random_normal([hidden_1,hidden_2]))
w3=tf.Variable(tf.random_normal([hidden_2,out_l]))

b1=tf.Variable(tf.random_normal([hidden_1]))
b2=tf.Variable(tf.random_normal([hidden_2]))
b3=tf.Variable(tf.random_normal([out_l]))

Layer1=tf.nn.relu_layer(x,w1,b1)
Layer2=tf.nn.relu_layer(Layer1,w2,b2)
y_pred=tf.matmul(Layer2,w3)+b3
y_true=tf.placeholder(tf.float32,[None,out_l])


loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_pred, 
labels=y_true))
optimizer= tf.train.AdamOptimizer(0.006).minimize(loss)
correct_pred=tf.equal(tf.argmax(y_pred,1), tf.argmax(y_true,1))
accuracy= tf.reduce_mean(tf.cast(correct_pred, tf.float32))

store_training=[]
store_step=[]
m = 10000

init=tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(num_iters):
        indices = random.sample(range(0, m), 100)
        batch_xs = train_x[indices]
        batch_ys = train_y[indices]
        sess.run(optimizer, feed_dict={x:batch_xs, y_true:batch_ys})
        training=sess.run(accuracy, feed_dict={x:test_x, y_true:test_y})
        store_training.append(training)  
    testing=sess.run(accuracy, feed_dict={x:test_x, y_true:test_y})

print('Accuracy :{:.4}%'.format(testing*100))
z_reg=len(store_training)
x_reg=np.arange(0,z_reg,1)
y_reg=store_training
plt.figure(1)
plt.plot(x_reg, y_reg,label='Regular Accuracy')

that is the error I got :

 
    "Traceback (most recent call last):

File "<ipython-input-2-ff57ada92ef5>", line 135, in <module> sess.run(optimizer, feed_dict={x:batch_xs, y_true:batch_ys}) File "C:\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "C:\anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1128, in _run str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (100, 784) for Tensor 'Reshape:0', which has shape '(?, 600)'"

Answer:

First of all, I´d recommend to fit PCA only for train set since you may get different PCA components for train and test. So the easiest fix is to change the following piece of code:

percent=600
pca=PCA(percent)
train_x=pca.fit_transform(train_x)
test_x=pca.fit_transform(test_x)

to

percent=.80
pca=PCA(percent)
pca.fit(train_x)
train_x=pca.transform(train_x)
test_x=pca.transform(test_x)

Secondly, you use percent=600 while doing PCA and then applying PCA inverse transform which means that you return to space with the original number of features. In order to start learning with the reduced number of PCA components you may also try to change this piece of code:

train_x=pca.inverse_transform(train_x)
test_x=pca.inverse_transform(test_x)
c=pca.n_components_
<plotting code>    
input_l=percent

to:

c=pca.n_components_
#plotting commented out   
input_l=c

It should give you the correct tensor dimensions for subsequent optimization procedure.

Question:

I am working on neural networks and I am currently creating a perceptron that will work as a classifier for a data set of images with faces. I am required to perform pca (principal component analysis) to my data set before dividing the samples into two different sets for training and testing. By doing this, I am lowering the dimensionality of the data and at the same time I am compressing the size of the images.

However, I am not a statistician and I have some problems defining the number of principal components to use for the pca method without any specific formula. My data set is an array of 4096x400, 400 being the number of the sample images and 4096 being their dimension. Is there a way to be more precise and accurate about the number of principal components to use during pca?

I am working on matlab so I am using princomp. Thank you in advance, any help will be highly appreciated.


Answer:

Question: How many principal components should I use in pattern classification?
Answer: As low as possible.

When you apply PCA, you get number of principal components according to your data. Lets say you get 10 principal components from your data. You will control how much your variance are explained with principal components.

For example

  component  variance explained
  1          0.40
  2          0.25
  3          0.15
  4          0.10
  5          0.05
  6          0.01
  7          0.01
  8          0.01         
  9          0.01         
  10         0.01         

With this, you decide on cutoff number and train your classifier. In this example, as you can see first 4 principal components holds %90 of information. Your results may be good enough with only 4 principal components.

You may add 5th principal components, these 5 principal components will hold %95 of your information, and so on.

See an example with PCA and image data here

Question:

I'm trying to implement a neural network for handwritten digit classification problem.

I have total 42,000 labeled image samples and 28,000 unlabeled images samples that I need to predict. Each image is 28x28 so there are total 784 pixels or features.

So first I normalize the data samples using sci-kit-learns MinMaxScaler and train my neural network. It has 9 hidden layers and 590 hidden nodes and I'm using ReLu. My score is 97.399%, so I think it is good.

After that, I thought I should use PCA in order to reduce the dimension of the data and make the training process faster.

I'm reducing the dimension of the data to 100-300.

But PCA is reducing the accuracy of the neural network and also causing to overfit.

With PCA I'm getting less than 50% accuracy.

I try to change the number of hidden layers and hidden nodes but still, the accuracy is below 50%.

So what should I do?


Answer:

Yes, It will not work because the reduced images have poor resolution(with no useful/good features for prediction).

  • Though PCA does not throw away every other pixel and it only transforms the data to have important features, reducing the dimension to 100-200 features can be too low. You cannot represent a good image with that.

  • You must select the number of dimensions in a way that maximum variance is retained. So in case of images, the maximum variance can be said just by visualizing the image.

  • Reducing Dimensions in an image where pixels are the features, would mean downsampling the image. So if you reduce the number of features(pixels) to 100-200 from 784 features(pixels), then most of the important features might be gone, resulting in poor performance.

  • Try visualizing the image after dimensionality reduction, compare it with the image with 784 pixels, you will see the difference.

  • If you still want to use PCA and reduce the dimensions, you can reduce it to a level where you can visually approve the image to have a good amount of features for the algorithm to work.

Hope this helps!