Hot questions for Using Neural networks in csv

Question:

I have two CSV files reviews_positive.csv and reviews_negative.csv. How can I combine them into one CSV file, but in the following condition:

  • Have odd rows fill with reviews from reviews_positive.csv and even rows fill up with reviews from reviews_negative.csv.
  • I am using Pandas

I need this specific order because I want to build a balanced dataset for training using neural networks


Answer:

Here is a working example

from io Import StringIO
import pandas as pd

pos = """rev
a
b
c"""

neg = """rev
e
f
g
h
i"""

pos_df = pd.read_csv(StringIO(pos))
neg_df = pd.read_csv(StringIO(neg))

Solution pd.concat with the keys parameter to label the source dataframes as well as to preserve the desired order of positive first. Then we sort_index with parameter sort_remaining=False

pd.concat(
    [pos_df, neg_df],
    keys=['pos', 'neg']
).sort_index(level=1, sort_remaining=False)

      rev
pos 0   a
neg 0   e
pos 1   b
neg 1   f
pos 2   c
neg 2   g
    3   h
    4   i

That said, you don't have to interweave them to take balanced samples. You can use groupby with sample

pd.concat(
    [pos_df, neg_df],
    keys=['pos', 'neg']
).groupby(level=0).apply(pd.DataFrame.sample, n=3)

          rev
pos pos 1   b
        2   c
        0   a
neg neg 1   f
        4   i
        3   h

Question:

I have a large dataset, too large to fit into RAM, which is available either as HDF5 or CSV. How can I feed it into Keras in minibatches? Also, will this shuffle it for me, or do I need to pre-shuffle the dataset?

(I'm also interested in this when the input is a Numpy recarray; since Keras I believe wants the input to be a ndarray.)

And, if I want to do some lightweight preprocessing in Keras before learning (e.g. apply a few Python functions to the data to change the representation), hcan that be added?


Answer:

Have a look at the fit_generator method available with Keras here: https://keras.io/models/sequential/#sequential-model-methods It fits the model on data generated batch-by-batch by a Python generator (Where you can write shuffling logic, since generator is under your control).

You may apply call pre-processing within the generator itself.

Hope this helps.

Question:

When I used following codes

import tensorflow as tf

# def input_pipeline(filenames, batch_size):
#     # Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.
#     dataset = (tf.contrib.data.TextLineDataset(filenames)
#                .map(lambda line: tf.decode_csv(
#                     line, record_defaults=[['1'], ['1'], ['1']], field_delim='-'))
#                .shuffle(buffer_size=10)  # Equivalent to min_after_dequeue=10.
#                .batch(batch_size))

#     # Return an *initializable* iterator over the dataset, which will allow us to
#     # re-initialize it at the beginning of each epoch.
#     return dataset.make_initializable_iterator() 

def decode_func(line):
    record_defaults = [['1'],['1'],['1']]
    line = tf.decode_csv(line, record_defaults=record_defaults, field_delim='-')
    str_to_int = lambda r: tf.string_to_number(r, tf.int32)
    query = tf.string_split(line[:1], ",").values
    title = tf.string_split(line[1:2], ",").values
    query = tf.map_fn(str_to_int, query, dtype=tf.int32)
    title = tf.map_fn(str_to_int, title, dtype=tf.int32)
    label = line[2]
    return query, title, label

def input_pipeline(filenames, batch_size):
    # Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.
    dataset = tf.contrib.data.TextLineDataset(filenames)
    dataset = dataset.map(decode_func)
    dataset = dataset.shuffle(buffer_size=10)  # Equivalent to min_after_dequeue=10.
    dataset = dataset.batch(batch_size)

    # Return an *initializable* iterator over the dataset, which will allow us to
    # re-initialize it at the beginning of each epoch.
    return dataset.make_initializable_iterator() 


filenames=['2.txt']
batch_size = 3
num_epochs = 10
iterator = input_pipeline(filenames, batch_size)

# `a1`, `a2`, and `a3` represent the next element to be retrieved from the iterator.    
a1, a2, a3 = iterator.get_next()

with tf.Session() as sess:
    for _ in range(num_epochs):
        print(_)
        # Resets the iterator at the beginning of an epoch.
        sess.run(iterator.initializer)
        try:
            while True:
                a, b, c = sess.run([a1, a2, a3])
                print(type(a[0]), b, c)
        except tf.errors.OutOfRangeError:
            print('stop')
            # This will be raised when you reach the end of an epoch (i.e. the
            # iterator has no more elements).
            pass                 

        # Perform any end-of-epoch computation here.
        print('Done training, epoch reached')

The script crashed didn't return any results, and stop when reached a, b, c = sess.run([a1, a2, a3]), but when I commented

query = tf.map_fn(str_to_int, query, dtype=tf.int32)
title = tf.map_fn(str_to_int, title, dtype=tf.int32)

It works and return the results.

In 2.txt, the data format is like

1,2,3-4,5-0
1-2,3,4-1
4,5,6,7,8-9-0

In addition, why the return results are byte-like object rather than str?


Answer:

I had a look and it appears that if you replace:

query = tf.map_fn(str_to_int, query, dtype=tf.int32)
title = tf.map_fn(str_to_int, title, dtype=tf.int32)
label = line[2]

by

query = tf.string_to_number(query, out_type=tf.int32)
title = tf.string_to_number(title, out_type=tf.int32)
label = tf.string_to_number(line[2], out_type=tf.int32)

it works just fine.

It appears that having 2 nested TensorFlow lambda functions (the tf.map_fnand the DataSet.map) just don't work. Luckily enough, it was over complicated.

Regarding your second question, I got this as output:

[(array([4, 5, 6, 7, 8], dtype=int32), array([9], dtype=int32), 0)]
<type 'numpy.ndarray'>

Question:

So I'm using Jeff Heaton's Neural Network library.

When trying to solve the Iris plant classification problem I have an issue with data normalization.

I am able to Normalize a CSV file using the following method:

public void NormalizeFile(FileInfo SourceDataFile, FileInfo NormalizedDataFile, FileInfo NormalizationConfigFile)
    {

        var wizard = new AnalystWizard(_analyst);
        wizard.Wizard(SourceDataFile, _useHeaders, AnalystFileFormat.DecpntComma);
        var norm = new AnalystNormalizeCSV();
        norm.Analyze(SourceDataFile, _useHeaders, CSVFormat.English, _analyst);
        norm.ProduceOutputHeaders = _useHeaders;
        norm.Normalize(NormalizedDataFile);

        // save normalization configuration, which can be used later to denormalize to get the raw output.
        _analyst.Save(NormalizationConfigFile);

    }

So far so good... The program works with a high degree of accuracy.

The problem occurs when I want to enter the values into my console application.

I have some input data

  • sepal width
  • sepal length
  • petal width
  • petal length

Each of these values has a different high/low I would like to normalize these values so that I can feed them into my network without writing a CSV file to disk.


Answer:

According to this link you can do this easily using Encog.Util.Arrayutil.NormalizeArray like so :

I assume your data stored in double[]

Encog.Util.Arrayutil.NormalizeArray normalizer = new Encog.Util.Arrayutil.NormalizeArray();
var normalizedData = normalizer.Process(dataMatrix, 0, 1);//(yourdata, low, high)

Question:

I've tried the codes provided by Tensorflow here

I've also tried the solution provided by Nicolas, I encountered an error:

ValueError: Shape () must have rank at least 1

but I am incapable of manipulating the code such that I can grab the data and place it in train_X and train_Y variables.

I'm currently using hard coded data for variable train_X and train_Y.

My csv file contains 2 columns, Height & State of Charge(SoC), where height is a float value and SoC is a whole number (Int) starting from 0 with increment of 10 to a maximum of 100.

I want to grab the data from the columns and use it in a linear regression model, where Height is the Y value and SoC is the x value.

Here's my code:

filename_queue = tf.train.string_input_producer("battdata.csv")

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1], [1]]
col1, col2= tf.decode_csv(
    value, record_defaults=record_defaults)
features = tf.stack([col1, col2])

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  for i in range(1200):
    # Retrieve a single instance:
    example, label = sess.run([features, col2])

  coord.request_stop()
  coord.join(threads)

I want to change use the csv data in this model:

# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

# tf Graph Input
X = tf.placeholder("float")#Charge
Y = tf.placeholder("float")#Height

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b) # XW + b <- y = mx + b  where W is gradient, b is intercept

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
    with tf.Session() as sess:
        sess.run(init)

        # Fit all training data
        for epoch in range(training_epochs):
            for (x, y) in zip(train_X, train_Y):
                sess.run(optimizer, feed_dict={X: x, Y: y})

            #Display logs per epoch step
            if (epoch+1) % display_step == 0:
                c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
                print( "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                    "W=", sess.run(W), "b=", sess.run(b))

        print("Optimization Finished!")
        training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
        print ("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

        #Graphic display
        plt.plot(train_X, train_Y, 'ro', label='Original data')
        plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
        plt.legend()
        plt.show()

EDIT:

I've also tried the solution provided by Nicolas, I encountered an error:

ValueError: Shape () must have rank at least 1

I solved this issue by adding square brackets around my file name like so:

filename_queue = tf.train.string_input_producer(['battdata.csv'])

Answer:

All you need to do is to replace your placeholder tensors by the op you get form the decode_csv method. This way whenever you will run the optimiser, the TensorFlow Graph will ask for a new row to be read from the file through the various Tensor dependencies:

optimiser => cost=> pred=> X cost => Y

It would give something like that:

filename_queue = tf.train.string_input_producer("battdata.csv")

reader = tf.TextLineReader()
key, value = reader.read(filename_queue)

# Default values, in case of empty columns. Also specifies the type of the
# decoded result.
record_defaults = [[1.], [1]]
X, Y = tf.decode_csv(
    value, record_defaults=record_defaults)

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b) # XW + b <- y = mx + b  where W is gradient, b is intercept

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
  # Start populating the filename queue.
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  # Fit all training data
  for epoch in range(training_epochs):
      _, cost_value = sess.run([optimizer, cost])

   [...] # The rest of your code

  coord.request_stop()
  coord.join(threads) 

Question:

I am using two tutorials to figure out how to take a CVS file of format:

feature1,feature2....feature20,label
feature1,feature2....feature20,label
...

and train a neural network on it. What I do in the code below is read in the CVS file and group 100 lines at a time into batches: x_batch and y_batch. Next, i try to have the NN learn in batches. However, I get the following error:

"ValueError: Cannot feed value of shape (99,) for Tensor 'Placeholder_1:0', which has shape '(?, 4)'"

I am wondering what i am doing wrong and what another approach might be.

import tensorflow as tf

filename_queue = tf.train.string_input_producer(["VOL_TRAIN.csv"])


line_reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = line_reader.read(filename_queue)


# Type information and column names based on the decoded CSV.
[[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[""]]

record_defaults = [[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0.0],[0]]
in1,in2,in3,in4,in5,in6,in7,in8,in9,in10,in11,in12,in13,in14,in15,in16,in17,in18,in19,in20,out = \
    tf.decode_csv(csv_row, record_defaults=record_defaults)

# Turn the features back into a tensor.
features = tf.pack([in1,in2,in3,in4,in5,in6,in7,in8,in9,in10,in11,in12,in13,in14,in15,in16,in17,in18,in19,in20])


# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1
num_examples= 33500

# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 20 # MNIST data input (img shape: 28*28)
n_classes = 4 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer


# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()



with tf.Session() as sess:
    #tf.initialize_all_variables().run()
    sess.run(init)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(num_examples/batch_size)
        # Loop over all batches

        for i in range(total_batch):
            batch_x = []
            batch_y = []
            for iteration in range(1, batch_size):
                example, label = sess.run([features, out])
                batch_x.append(example)
                batch_y.append(label)

            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print ("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print ("Optimization Finished!")
    coord.request_stop()
    coord.join(threads)

Answer:

Your placeholder y specifies you input an array of unknown length, with arrays of length "n_classes" (which is 4). In your feed_dict you give the array batch_y, which is an array of length 99 (your batch_size) with numbers.

What you want to do is change your batch_y variable to have one-hot vectors as input. Please let me know if this works!

Question:

I have a question. At school we started a new project with Neural Networks and we had to choose what kind of AI we wanted to program. I chose for a Recurrent Neural Network which could predict if a price will be higher or lower after a few periods. I succesfully programmed that and it trained well. But now I want to try a test run, but I don't know how I can prepare a csv file to feed the RNN. This is my trainings code:

main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "ETH-USD"]
for ratio in ratios:


    url="https://www.test.nl/get_csv_content.php?method=train&ratio=" + str(ratio)
    dataset = requests.get(url, verify=False).content
    df = pd.read_csv(io.StringIO(dataset.decode('utf-8')), names=["time", "low", "high", "open", "close", "volume", "rsi14", "ma5", "ema5", "ema12", "ema20", "macd", "signal"])

    df.rename(columns={"close": str(ratio)+"_close", "volume": str(ratio) + "_volume", "rsi14": str(ratio) + "_rsi14", "ma5": str(ratio) + "_ma5", "ema5": str(ratio) + "_ema5", "ema12": str(ratio) + "_ema12", "ema20": str(ratio) + "_ema20", "macd": str(ratio) + "_macd", "signal": str(ratio) + "_signal"}, inplace=True)

    df.set_index("time", inplace=True)
    df = df[[str(ratio) + "_close", str(ratio) + "_volume", str(ratio) + "_rsi14", str(ratio) + "_ma5", str(ratio) + "_ema5", str(ratio) + "_ema12", str(ratio) + "_ema20", str(ratio) + "_macd", str(ratio) + "_signal"]]

    if len(main_df) == 0:
        main_df = df
    else:
        main_df = main_df.join(df)


main_df['future'] = main_df[str(RATIO_TO_PREDICT) + "_close"].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[str(RATIO_TO_PREDICT) + "_close"], main_df["future"]))
#print(main_df[[str(RATIO_TO_PREDICT) + "_close", "future", "target"]].head(10))


times = sorted(main_df.index.values)
last_5pct = times[-int(0.05*len(times))]

validation_main_df = main_df[(main_df.index >= last_5pct)]
main_df = main_df[(main_df.index < last_5pct)]

train_x, train_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df)

And here are the functions:

#Constant Variables
SEQ_LEN = 30
FUTURE_PERIOD_PREDICT = 3
RATIO_TO_PREDICT = "LTC-USD"
EPOCHS = 10
BATCH_SIZE = 64
NAME = str(RATIO_TO_PREDICT) + "-" + str(SEQ_LEN) + "-SEQ-" + str(FUTURE_PERIOD_PREDICT) + "-PRED-" + str(int(time.time()))

def classify(current, future):
    if float(future) > float(current):
        return 1
    else:
        return 0

def preprocess_df(df):
    df = df.drop('future', 1)

    for col in df.columns:
        if col != "target":
            df[col] = df[col].pct_change()
            df.dropna(inplace=True)
            df[col] = preprocessing.scale(df[col].values)

    df.dropna(inplace=True)

    sequential_data = []
    prev_days = deque(maxlen=SEQ_LEN)



    for i in df.values:
        prev_days.append([n for n in i[:-1]])
        if len(prev_days) == SEQ_LEN:
            sequential_data.append([np.array(prev_days), i[-1]])

    random.shuffle(sequential_data)

    buys = []
    sells = []

    for seq, target in sequential_data:
        if target == 0:
            sells.append([seq, target])
        elif target == 1:
            buys.append([seq, target])


    random.shuffle(buys)
    random.shuffle(sells)

    lower = min(len(buys), len(sells))


    buys = buys[:lower]
    sells = sells[:lower]


    sequential_data = buys+sells

    random.shuffle(sequential_data)

    x = []
    y = []

    for seq, target in sequential_data:
        x.append(seq)
        y.append(target)

    return np.array(x), y

Now my question is: After I trained the model, how can I prepare a new CSV file in the model?


Answer:

Typically, to test your model, you would select a subset of your original dataset, and set it aside for testing purposes only. That is, you would not use that data for training at all.

Now, the link that you're using in your code to fetch CSV files from a remote server does not work for me, but it does have a query param, ?method=train, which you could presumably change to something like ?method=test to fetch the testing dataset, and use that for your trial run. Failing that, you could just set aside 20% of your dataset for testing, and use the rest for training.

Question:

I have a standard CSV file with a bunch of rows that all have 60 columns of random numbers (float). On columns 61-63, I have numbers (floats again) that are some function of the first 60 columns.

I did the sum of the first 20 columns multiplied by the sum of the next 40 columns for the first “output” column and then other arbitrary variations for the next two output columns. I want my machine learning algorithm to pick up on this formulaic relationship and give predictions for the three output numbers.

Here is how I read in the data

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

def read_data():
    rd = pd.read_csv(file_path)
    x = rd[rd.columns[0:60]].values
    y = rd[rd.columns[60:63]].values
    X = x.astype(float) #just making sure we have the right dtype 
    Y = y.astype(float) 
    print(X.shape)
    print(Y.shape)
    return (X, Y)

X, Y = read_data()

Then I shuffle and split the data into training and testing sets

X, Y = shuffle(X, Y, random_state=1)
train_x, test_x, train_y, test_y = train_test_split(X, Y, test_size=0.25, random_state=117)

Next I define my model, the weights and biases

n_dim = X.shape[1]
print("n_dim", n_dim)
n_output = Y.shape[1]
print("n_output", n_output)    
n_hidden_1 = 100
n_hidden_2 = 75
n_hidden_3 = 50
n_hidden_4 = 50
x = tf.placeholder(tf.float32, [None, n_dim])
W = tf.Variable(tf.zeros([n_dim, n_output]))
b = tf.Variable(tf.zeros([n_output]))
y = tf.placeholder(tf.float32, [None, n_output])

def layered_model(x, weights, biases):

    # 4 hidden layers with sigmoid and relu
    layer_1 = tf.add(tf.matmul(x, weights['w1']), biases['b1'])
    layer_1 = tf.nn.sigmoid(layer_1)

    layer_2 = tf.add(tf.matmul(layer_1, weights['w2']), biases['b2'])
    layer_2 = tf.nn.sigmoid(layer_2)

    layer_3 = tf.add(tf.matmul(layer_2, weights['w3']), biases['b3'])
    layer_3 = tf.nn.sigmoid(layer_3)

    layer_4 = tf.add(tf.matmul(layer_3, weights['w4']), biases['b4'])
    layer_4 = tf.nn.relu(layer_4)

    out_layer = tf.matmul(layer_4, weights['out']) + biases['out']
    return out_layer

weights = {
    'w1': tf.Variable(tf.truncated_normal([n_dim, n_hidden_1])),
    'w2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2])),
    'w3': tf.Variable(tf.truncated_normal([n_hidden_2, n_hidden_3])),
    'w4': tf.Variable(tf.truncated_normal([n_hidden_3, n_hidden_4])),
    'out': tf.Variable(tf.truncated_normal([n_hidden_4, n_output]))
}
biases = {
    'b1': tf.Variable(tf.truncated_normal([n_hidden_1])),
    'b2': tf.Variable(tf.truncated_normal([n_hidden_2])),
    'b3': tf.Variable(tf.truncated_normal([n_hidden_3])),
    'b4': tf.Variable(tf.truncated_normal([n_hidden_4])),
    'out': tf.Variable(tf.truncated_normal([n_output]))
}

How do I feed my data into a cost function, and then use that for my epochs? All the tutorials I can find are for labeled datasets, putting things into "buckets". Whereas this is a purely numeric input/output.

The only information I can find is that numeric cost functions usually use a squared error approach and feed_dict will be necessary:

cost_function = tf.reduce_mean(tf.square(prediction - actual))

Answer:

I managed to get it "working". However, the cost function gets minimized to a number that is close to all the training results and then it will always return that number, regardless of the input. Not really "learning" in the useful sense.

I found that I had to break my problem into a classification task before machine learning could make useful predictions.

Question:

Good day!I want to use Keras python library to train neural net I want to make 4 input neurons and 1 output neuron.I want to use my own csv file with numbers:here is it

my_5_input_numbers.csv
0.3,0.5,0.6,0.7,1
0.4,0.6,0.7,0.8,0
0.5,0.7,0.8,0.9,1

I use numpy to read csv and make train matrix.Here is the code and the error

import numpy as np
np.set_printoptions(threshold=np.inf)
from keras.datasets import boston_housing
from keras.models import Sequential
from keras.layers import Dense
data_common=np.genfromtxt('my_5_input_numbers.csv',delimiter=',')
"""
data_common=array([[ 0.3,  0.5,  0.6,  0.7,  1. ],
       [ 0.4,  0.6,  0.7,  0.8,  0. ],
       [ 0.5,  0.7,  0.8,  0.9,  1. ]])
data_common.shape=(3,5)       
"""

X_train=data_common[:,-1]#X_train.shape=(3,4)
y_train=data_common[0:4,-1]#y_train.shape=(3,)
y_train=y_train.reshape(3,1)
model = Sequential()
model.add(Dense(128,input_dim=4,activation='relu'))
model.add(Dense(1,activation="softmax"))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
# train nn
model.fit(X_train, y_train, batch_size=200, epochs=25, validation_split=0.2, verbose=2)
#<---Error:File "D:\NetbeansPythonProjects\testDiffrentCode\src\testKeras.py", line 16, in <module>
#    model.fit(X_train, y_train, batch_size=200, epochs=25, validation_split=0.2, verbose=2)
#ValueError: 
#Error when checking input:
#expected dense_1_input to have shape (None, 4) but got array with shape (3, 1)

Answer:

You just simply messed up your input:

X_train=data_common[:,-1] # <--- X_train.shape was actually (3,) not (3,4).
y_train=data_common[0:4,-1] # <--- This was wrong as well.
y_train=y_train.reshape(3,1)

Should've been

X_train=data_common[:,0:4]
y_train=data_common[:,-1] 
y_train=y_train.reshape(3,1)

The indexing of numpy is row first, then column.

Question:

I have a dermatology database of which the normalization is already done. It is a CSV file. I need to open the file and input into a numpy array. My database has 34 columns and about 350 rows. My neural network has 3 hidden layers. This is my present python code for the neural network. Can someone help me with the input in a numpy array/matrix? Thank you

Here is my code:

import numpy as np
#input x
X = np.array([  ])
#input y
y = np.array([]).T

syn0 = 2*np.random.random((34,26)) - 1
syn1 = 2*np.random.random((26,18)) - 1
syn2 = 2*np.random.random((18,11)) - 1
syn3 = 2*np.random.random((11,6)) - 1
for j in xrange(350):
  l1 = 1/(1+np.exp(-(np.dot(X,syn0))))
  l2 = 1/(1+np.exp(-(np.dot(l1,syn1))))
  l3 = 1/(1+np.exp(-(np.dot(l2,syn2))))
  l4 = 1/(1+np.exp(-(np.dot(l3,syn3))))
  l4_delta = (y - l4)*(l4*(1-l4))
  l3_delta = l4_delta.dot(syn3.T) * (l3 * (1-l3))
  l2_delta = l3_delta.dot(syn2.T) * (l2 * (1-l2))
  l1_delta = l2_delta.dot(syn1.T) * (l1 * (1-l1))
  syn3 += l1.T.dot(l4_delta)
  syn2 += l1.T.dot(l3_delta)
  syn1 += l1.T.dot(l2_delta)
  syn0 += X.T.dot(l1_delta)

Answer:

Assuming the labels are in the last column,

X = np.array([[float(cell) for cell in row[:-1]] for row in csv.reader(open(csv_filename))])
Y = np.array([float(row[-1]) for row in csv.reader(open(csv_filename))])

I don't think you need to transpose Y, assuming it's one-dimensional.

Question:

I want to run some images through a neural network, and I want to create a .csv file for the data. How can I create a csv that will represent the images and keep each image separate?


Answer:

One way to approach is to use numpy to convert image to array, which can then be converted to a CSV file or simply a comma separated list.

The csv data can be manipulated or original image can be retrieved when needed.

Here is a basic code that demonstrates above concept.

import Image
import numpy as np

#Function to convert image to array or list
def loadImage (inFileName, outType ) :
    img = Image.open( inFileName )
    img.load()
    data = np.asarray( img, dtype="int32" )
    if outType == "anArray":
        return data
    if outType == "aList":
        return list(data)


#Load image to array
myArray1 = loadImage("bug.png", "anArray")

#Load image to a list
myList1 = loadImage("bug.png", "aList")