Hot questions for Using Neural networks in dimensionality reduction

Top 10 Python Open Source / Neural networks / dimensionality reduction

Question:

I'm trying to adapt Aymeric Damien's code to visualize the dimensionality reduction performed by an autoencoder implemented in TensorFlow. All of the examples I have seen work on the mnist digits dataset but I wanted to use this method to visualize the iris dataset in 2 dimensions as a toy example so I can figure out how to tweak it for my real-world datasets.

My question is: How can one get the sample-specific 2 dimensional embeddings to visualize?

For example, the iris dataset has 150 samples with 4 attributes. I added 4 noise attributes to get a total of 8 attributes. The encoding/decoding follows: [8, 4, 2, 4, 8] but I'm not sure how to extract an array of shape (150, 2) to visualize the embeddings. I haven't found any tutorials on how to visualize the dimensionality reduction using TensorFlow.

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

# Set random seeds
np.random.seed(0)
tf.set_random_seed(0)

# Load data
iris = load_iris()
# Original Iris : (150,4)
X_iris = iris.data 
# Iris with noise : (150,8)
X_iris_with_noise = np.concatenate([X_iris, np.random.random(size=X_iris.shape)], axis=1).astype(np.float32)
y_iris = iris.target

# PCA
pca_xy = PCA(n_components=2).fit_transform(X_iris_with_noise)
with plt.style.context("seaborn-white"):
    fig, ax = plt.subplots()
    ax.scatter(pca_xy[:,0], pca_xy[:,1], c=y_iris, cmap=plt.cm.Set2)
    ax.set_title("PCA | Iris with noise")

# Training Parameters
learning_rate = 0.01
num_steps = 1000
batch_size = 10

display_step = 250
examples_to_show = 10

# Network Parameters
num_hidden_1 = 4 # 1st layer num features
num_hidden_2 = 2 # 2nd layer num features (the latent dim)
num_input = 8 # Iris data input 

# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input], name="input")

weights = {
    'encoder_h1': tf.Variable(tf.random_normal([num_input, num_hidden_1]), dtype=tf.float32, name="encoder_h1"),
    'encoder_h2': tf.Variable(tf.random_normal([num_hidden_1, num_hidden_2]), dtype=tf.float32, name="encoder_h2"),
    'decoder_h1': tf.Variable(tf.random_normal([num_hidden_2, num_hidden_1]), dtype=tf.float32, name="decoder_h1"),
    'decoder_h2': tf.Variable(tf.random_normal([num_hidden_1, num_input]), dtype=tf.float32, name="decoder_h2"),
}
biases = {
    'encoder_b1': tf.Variable(tf.random_normal([num_hidden_1]), dtype=tf.float32, name="encoder_b1"),
    'encoder_b2': tf.Variable(tf.random_normal([num_hidden_2]), dtype=tf.float32, name="encoder_b2"),
    'decoder_b1': tf.Variable(tf.random_normal([num_hidden_1]), dtype=tf.float32, name="decoder_b1"),
    'decoder_b2': tf.Variable(tf.random_normal([num_input]), dtype=tf.float32, name="decoder_b2"),
}

# Building the encoder
def encoder(x):
    # Encoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['encoder_h1']),
                                   biases['encoder_b1']))
    # Encoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['encoder_h2']),
                                   biases['encoder_b2']))
    return layer_2


# Building the decoder
def decoder(x):
    # Decoder Hidden layer with sigmoid activation #1
    layer_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, weights['decoder_h1']),
                                   biases['decoder_b1']))
    # Decoder Hidden layer with sigmoid activation #2
    layer_2 = tf.nn.sigmoid(tf.add(tf.matmul(layer_1, weights['decoder_h2']),
                                   biases['decoder_b2']))
    return layer_2

# Construct model
encoder_op = encoder(X)
decoder_op = decoder(encoder_op)

# Prediction
y_pred = decoder_op
# Targets (Labels) are the input data.
y_true = X

# Define loss and optimizer, minimize the squared error
loss = tf.reduce_mean(tf.pow(y_true - y_pred, 2))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start Training
# Start a new TF session
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    # Training
    for i in range(1, num_steps+1):
        # Prepare Data
        # Get the next batch of Iris data 
        idx_train = np.random.RandomState(i).choice(np.arange(X_iris_with_noise.shape[0]), size=batch_size)
        batch_x = X_iris_with_noise[idx_train,:]
        # Run optimization op (backprop) and cost op (to get loss value)
        _, l = sess.run([optimizer, loss], feed_dict={X: batch_x})
        # Display logs per step
        if i % display_step == 0 or i == 1:
            print('Step %i: Minibatch Loss: %f' % (i, l))

Answer:

Your embedding is accessible with h = encoder(X). Then, for each batch you can get the value as follow:

_, l, embedding = sess.run([optimizer, loss, h], feed_dict={X: batch_x})

There is an even nicer solution with TensorBoard using Embeddings Visualization (https://www.tensorflow.org/programmers_guide/embedding):

from tensorflow.contrib.tensorboard.plugins import projector
config = projector.ProjectorConfig()

embedding = config.embeddings.add()
embedding.tensor_name = h.name

# Use the same LOG_DIR where you stored your checkpoint.
summary_writer = tf.summary.FileWriter(LOG_DIR)

projector.visualize_embeddings(summary_writer, config)

Question:

Here is a autoencoder trained on mnist using PyTorch :

import torch
import torchvision
import torch.nn as nn
from torch.autograd import Variable

cuda = torch.cuda.is_available() # True if cuda is available, False otherwise
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
print('Training on %s' % ('GPU' if cuda else 'CPU'))

# Loading the MNIST data set
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize((0.1307,), (0.3081,))])
mnist = torchvision.datasets.MNIST(root='../data/', train=True, transform=transform, download=True)

# Loader to feed the data batch by batch during training.
batch = 100
data_loader = torch.utils.data.DataLoader(mnist, batch_size=batch, shuffle=True)

autoencoder = nn.Sequential(
                # Encoder
                nn.Linear(28 * 28, 512),
                nn.PReLU(512),
                nn.BatchNorm1d(512),

                # Low-dimensional representation
                nn.Linear(512, 128),   
                nn.PReLU(128),
                nn.BatchNorm1d(128),

                # Decoder
                nn.Linear(128, 512),
                nn.PReLU(512),
                nn.BatchNorm1d(512),
                nn.Linear(512, 28 * 28))

autoencoder = autoencoder.type(FloatTensor)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.005)

epochs = 10
data_size = int(mnist.train_labels.size()[0])

for i in range(epochs):
    for j, (images, _) in enumerate(data_loader):
        images = images.view(images.size(0), -1) # from (batch 1, 28, 28) to (batch, 28, 28)
        images = Variable(images).type(FloatTensor)

        autoencoder.zero_grad()
        reconstructions = autoencoder(images)
        loss = torch.dist(images, reconstructions)
        loss.backward()
        optimizer.step()
    print('Epoch %i/%i loss %.2f' % (i + 1, epochs, loss.data[0]))

print('Optimization finished.')

I'm attempting to compare the lower dimensionality representation of each image.

Printing the dimensionality of each layer :

for l in autoencoder.parameters() : 
    print(l.shape)

displays :

torch.Size([512, 784])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([128, 512])
torch.Size([128])
torch.Size([128])
torch.Size([128])
torch.Size([128])
torch.Size([512, 128])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([512])
torch.Size([784, 512])
torch.Size([784])

So appears the dimensionality is not stored in learned vectors ?

In other words if I have 10000 images each containing 100 pixels, executing above autoencoder that reduces dimensionality to 10 pixels should allow to access the 10 pixel dimensionality of all 10000 images ?


Answer:

I'm not very familiar with pyTorch, but splitting the autoencoder into an encoder and decoder model seems to work (I changed the size of the hidden layer from 512 to 64, and the dimension of the encoded image from 128 to 4, to make the example run faster):

import torch
import torchvision
import torch.nn as nn
from torch.autograd import Variable

cuda = torch.cuda.is_available() # True if cuda is available, False otherwise
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
print('Training on %s' % ('GPU' if cuda else 'CPU'))

# Loading the MNIST data set
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize((0.1307,), (0.3081,))])
mnist = torchvision.datasets.MNIST(root='../data/', train=True, transform=transform, download=True)

# Loader to feed the data batch by batch during training.
batch = 100
data_loader = torch.utils.data.DataLoader(mnist, batch_size=batch, shuffle=True)


encoder = nn.Sequential(
                # Encoder
                nn.Linear(28 * 28, 64),
                nn.PReLU(64),
                nn.BatchNorm1d(64),

                # Low-dimensional representation
                nn.Linear(64, 4),
                nn.PReLU(4),
                nn.BatchNorm1d(4))

decoder = nn.Sequential(
                # Decoder
                nn.Linear(4, 64),
                nn.PReLU(64),
                nn.BatchNorm1d(64),
                nn.Linear(64, 28 * 28))

autoencoder = nn.Sequential(encoder, decoder)

encoder = encoder.type(FloatTensor)
decoder = decoder.type(FloatTensor)
autoencoder = autoencoder.type(FloatTensor)

optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=0.005)

epochs = 10
data_size = int(mnist.train_labels.size()[0])

for i in range(epochs):
    for j, (images, _) in enumerate(data_loader):
        images = images.view(images.size(0), -1) # from (batch 1, 28, 28) to (batch, 28, 28)
        images = Variable(images).type(FloatTensor)

        autoencoder.zero_grad()
        reconstructions = autoencoder(images)
        loss = torch.dist(images, reconstructions)
        loss.backward()
        optimizer.step()
    print('Epoch %i/%i loss %.2f' % (i + 1, epochs, loss.data[0]))

print('Optimization finished.')

# Get the encoded images here
encoded_images = []
for j, (images, _) in enumerate(data_loader):
    images = images.view(images.size(0), -1) 
    images = Variable(images).type(FloatTensor)

    encoded_images.append(encoder(images))

Question:

I want to configure a deep autoencoder in order to reduce the dimensionality of my input data as described in this paper. The layer sizes should be 2000-500-250-125-2-125-250-500-2000 and I want to be able to pull out the activation of the layer in the middle (as described in the paper, I want to use the values as coordinates). The input data consists of binary vectors with a length of 2000 each. Now I'm searching for a working example which I can use as a starting point. I already tried DeepLearning4J but wasn't able to build a satisfying autoencoder. I would be thankful for any suggestions.


Answer:

You should take a look at some of the tutorials over at deeplearning.net. They have a Stacked Denoising Autoencoder example with code. All of the tutorials are written in Theano which is a scientific computing library that will generate GPU code for you.

Here's an example of a visualization of the learned weights on the 3rd layer of a 200x200x200 SdA trained on LFW. You can simply modify the SdA tutorial code linked above to get the same results.

Question:

I have a training set with the size of (size(X_Training)=122 x 125937).

  • 122 is the number of features
  • and 125937 is the sample size.

From my little understanding, PCA is useful when you want to reduce the dimension of the features. Meaning, I should reduce 122 to a smaller number.

But when I use in matlab:

X_new = pca(X_Training)

I get a matrix of size 125973x121, I am really confused, because this not only changes the features but also the sample size? This is a big problem for me, because I still have the target vector Y_Training that I want to use for my neural network.

Any help? Did I badly interpret the results? I only want to reduce the number of features.


Answer:

Firstly, the documentation of the PCA function is useful: https://www.mathworks.com/help/stats/pca.html. It mentions that the rows are the samples while the columns are the features. This means you need to transpose your matrix first.

Secondly, you need to specify the number of dimensions to reduce to a priori. The PCA function does not do that for you automatically. Therefore, in addition to extracting the principal coefficients for each component, you also need to extract the scores as well. Once you have this, you simply subset into the scores and perform the reprojection into the reduced space.

In other words:

n_components = 10; % Change to however you see fit.
[coeff, score] = pca(X_training.');
X_reduce = score(:, 1:n_components);

X_reduce will be the dimensionality reduced feature set with the total number of columns being the total number of reduced features. Also notice that the number of training examples does not change as we expect. If you want to make sure that the number of features are along the rows instead of the columns after we reduce the number of features, transpose this output matrix as well before you proceed.

Finally, if you want to automatically determine the number of features to reduce to, one method to do so is to calculate the variance explained of each feature, then accumulate the values from the first feature up to the point where we exceed some threshold. Usually 95% is used.

Therefore, you need to provide additional output variables to capture these:

[coeff, score, latent, tsquared, explained, mu] = pca(X_training.');

I'll let you go through the documentation to understand the other variables, but the one you're looking at is the explained variable. What you should do is find the point where the total variance explained exceeds 95%:

[~,n_components] = max(cumsum(explained) >= 95);

Finally, if you want to perform a reconstruction and see how well the reconstruction into the original feature space performs from the reduced feature, you need to perform a reprojection into the original space:

X_reconstruct = bsxfun(@plus, score(:, 1:n_components) * coeff(:, 1:n_components).', mu);

mu are the means of each feature as a row vector. Therefore you need add this vector across all examples, so broadcasting is required and that's why bsxfun is used. If you're using MATLAB R2018b, this is now implicitly done when you use the addition operation.

X_reconstruct = score(:, 1:n_components) * coeff(:, 1:n_components).' + mu;