Hot questions for Using Neural networks in forecasting



I am trying to use 5 years of consecutive, historical data to forecast values for the following year.

Data Structure

My input data input_04_08 looks like this where the first column is the day of the year (1 to 365) and the second column is the recorded input.


My output data output_04_08 looks like this, a single column with the recorded output on that day of the year.


I then normalise the values between 0 and 1 so the first sample given to the network would look like

Number of training patterns:  1825
Input and output dimensions:  2 1
First sample (input, target):
[ 0.00273973  0.04      ] [ 0.02185273]
Feed Forward Network

I have implemented the following code in PyBrain

input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',')
input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',')
output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',')
output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',')

input_04_08 = input_04_08 / input_04_08.max(axis=0)
input_09 = input_09 / input_09.max(axis=0)
output_04_08 = output_04_08 / output_04_08.max(axis=0)
output_09 = output_09 / output_09.max(axis=0)
ds = SupervisedDataSet(2, 1)

for x in range(0, 1825):
    ds.addSample(input_04_08[x], output_04_08[x])

n = FeedForwardNetwork()
inLayer = LinearLayer(2)
hiddenLayer = TanhLayer(25)
outLayer = LinearLayer(1)
in_to_hidden = FullConnection(inLayer, hiddenLayer)
hidden_to_out = FullConnection(hiddenLayer, outLayer)

trainer = BackpropTrainer(n, ds, learningrate=0.01, momentum=0.1)

for epoch in range(0, 100000000): 
    if epoch % 10000000 == 0:
        error = trainer.train()  
        print 'Epoch: ', epoch
        print 'Error: ', error

result = numpy.array([n.activate(x) for x in input_09])

and this gave me the following result with final error of 0.00153840123381

Admittedly, this looks good. However, having read more about LSTM (Long Short-Term Memory) neural networks, and there applicability to time series data, I am trying to build one.

LSTM Network

Below is my code

input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',')
input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',')
output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',')
output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',')

input_04_08 = input_04_08 / input_04_08.max(axis=0)
input_09 = input_09 / input_09.max(axis=0)
output_04_08 = output_04_08 / output_04_08.max(axis=0)
output_09 = output_09 / output_09.max(axis=0)
ds = SequentialDataSet(2, 1)

for x in range(0, 1825):
    ds.appendLinked(input_04_08[x], output_04_08[x])

fnn = buildNetwork( ds.indim, 25, ds.outdim, hiddenclass=LSTMLayer, bias=True, recurrent=True)
trainer = BackpropTrainer(fnn, ds, learningrate=0.01, momentum=0.1)

for epoch in range(0, 10000000): 
    if epoch % 100000 == 0:
        error = trainer.train()  
        print 'Epoch: ', epoch
        print 'Error: ', error

result = numpy.array([fnn.activate(x) for x in input_09])

This results in a final error of 0.000939719502501, but this time, when I feed the test data, the output plot looks terrible.

Possible Problems

I have looked around here at pretty much all the PyBrain questions, these stood out, but haven't helped me figure things out

  • Training an LSTM neural network to forecast time series in pybrain, python
  • Time Series Prediction via Neural Networks
  • Time series forecasting (eventually with python)

I have read a few blog posts, these helped further my understanding a bit, but obviously not enough

Naturally, I have also gone through the PyBrain docs but couldn't find much to help with the sequential dataset bar here.

Any ideas/tips/direction would be welcome.


I think what happened here is that you tried to assign hyperparameter values according to some rule-of-thumb which worked for the first case, but didn't for the second.

1) The error estimate that you're looking at is an optimistic prediction error estimate of the training set. The actual prediction error is high, but because you didn't test your model on unseen data there's no way of knowing it. Elements of statistical learning gives a nice description of this phenomenon. I would highly recommend this book. You can get it online for free.

2) To get an estimator with low prediction error you need to perform hyperparameter tuning. E.g. the number of hidden nodes, learning rate and momentum should be varied and tested on the unseen data to know which combination leads to the lowest prediction error. scikit-learn has GridSearchCV and RandomizedSearchCV to do just that, but they only work on sklearn`s estimators. You can roll your own estimator though, which is described in the documentation. Personally, I think that model selection and model evaluation are two different tasks. For the first one you can just run a single GridSearchCV or RandomizedSearchCV and get a set of best hyperparameters for your task. For model evaluation you need to run a more sophisticated analysis, such as nested cross-validation or even repeated nested cross-validation if you want an even more accurate estimate.

3) I don't know much about LSTM networks, but I see that in the first example you assigned 25 hidden nodes, but for LSTM you only provide 5. Maybe it's not enough to learn the pattern. You could also drop the output bias as done in the example.

P.S. I think this question actually belongs to, where you're likely to get a more detailed answer to your problem.

EDIT: I just noticed that you're teaching the model for 10 million epochs! I think it's a lot and probably part of the overfitting problem. I think it's a good idea to implement early stopping, i.e. stop training if some predefined error is achieved.


I spent some time to understand input_shape = (batch_size, steps, input_dim) in Keras Conv1D, but I cannot make any progress so far.

To be more specific, I have 2 datasets.

Dataset-1: Solar energy production of 24 hours of each day in one year, so the size of my dataset is (364,24), days are in row and consumption is in columns.

Example of 2 days:

day-1: [0   0   0   0   0   0   0   1.611   5.791   8.229   9.907   9.649   8.401   6.266   4.728   2.231   0.306   0.013   0   0   0   0   0   0] 
day-2: [0   0   0   0   0   0   0   1.732   5.839   9.909   12.593  14.242  12.744  9.596   5.808   2.019   0.241   0   0   0   0   0   0   0]`

What I want to do with CNN is using 6 days' data to predict the 7th day. for that reason, I divided my dataset like so:

xtrain = dataset[0:6,0:24] # takes 24 hour of 6 days
ytrain = dataset[6,0:24] # takes 24 hour of 7th day
xtest = dataset[1:7,0:24] # takes 24 hours for 6 days (day2 to day7) to predict day 8

To be compatible with Keras' input shape, I reshaped the training data as follows:

xtrain = xtrain.reshape(6,1,24)

Number of Samples: 6, Time Dimension: 1, input_dimension:24

Is this correct thinking?

model.add(Conv1D(**filters?**,kernel_size=4,activation='relu', **input_shape=?**)) 

In my second dataset:

Training Data: Xtrain: Day-1 Hour-1 to Hour-24, Day-2 Hour-1 to Hour-24 ... Day-6 Hour-1 to Hour-24
Ytrain: Day-7 Hour-1 to Hour-24

I have created a new dataset which takes 24 hours of a day in the rows and 7 days in the columns, so it is a (8616,7) matrix.

hour-1 day-1, day-2 ... day-7
hour-2 day-1, day-2 ... day-7
hour-24 day-1, day-2 ... day-7
hour-1 day-2, day-3 ... day-8
hour-2 day-2, day-3 ... day-8
hour-24 day-2, day-3 ... day-8
hour-1 day-359, day-360 ... day-365
hour-2 day-359, day-360 ... day-365
hour-24 day-359, day-360 ... day-365

Keras Code:

xtrain = dataset[0:24,0:6] # takes 24 hour for 6 days
ytrain = dataset[24:48,6] # takes 24 hour of 7th day
xtest = dataset[24:48,0:6] # takes 24 hours for 6 days (day2 to day7) to predict day 7

xtrain = xtrain[newaxis,:, :]
ytrain = ytrain.reshape(1,24)

I really don't understand what filters and input_shape should be.


You should reformat your dataset in a structure like this:


The first dimension sets the day.

The second dimensions set the timestep (the progression over the 6 days), so you should take each day of the original dataset 6:365 and then copy the past 6 days 24h.

The third dimension is each hour

Assuming that you have your original dataset of [1:365,1:24] ordered:

xtrain=np.array(np.tile(xtrain,[6 1]))


And then now you have the required 3D format for conv1d (batch, timesteps, channels)


I use the mlp and elm functions from the nnfor library for forecasting non-stationary time series. Both of them give different number of nodes in input and hidden layers. I am interested in how they choose the number of nodes in each layer and it would be great to understand the generalization error change the way it works in that functions.


The number of hidden nodes chosen by the mlp function depends on the value of the parameter:

  • "set" fixes hd=5.
  • "valid" uses a 20% validation set (randomly) sampled to find the best number of hidden nodes.
  • "cv" uses 5-fold cross-validation.
  • "elm" uses ELM to estimate the number of hidden nodes (experimental).

The number of hidden nodes tried for the "valid", "cv" and "elm" parameter values range from 1 to max(2, min(dim(X)[2] + 2, length(Y) - 2)). These hidden nodes are limited to a single layer.

The "cv" and "valid" approaches use the minimum of the mean square error to find the number of hidden nodes.

As far as I can tell from the auto.hd.elm function in the source code, the "elm" approach uses the median value of the number of significant model coefficients to choose the number of hidden nodes. Hope that makes sense to you!

The elm function uses min(100 - 60*(type=="step" | type=="lm"),max(4, length(Y) - 2 - as.numeric(direct)*length(X[1,]))) to determine the number of hidden nodes. Where type is estimation used for output layer weights and direct is presence of direct input-output connections.

The number of input nodes depends on seasonality and lags.

Generalization error can be approximated using cross-validation. To be clear, this cross-validation would have to be done separately from any validation used to find the number of hidden nodes.

The nnfor package author has an introductory blog post which may be worth checking:


I'm trying to build an LSTM model, data consists of date_time & some numeric values. While fitting the model, its getting

"ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (10, 1)" error.

Sample data: "date.csv" looks like:


06/13/2018 07:20:04 PM

06/13/2018 07:20:04 PM

06/13/2018 07:20:04 PM

06/13/2018 07:22:12 PM

06/13/2018 07:22:12 PM

06/13/2018 07:22:12 PM

06/13/2018 07:26:20 PM

06/13/2018 07:26:20 PM

06/13/2018 07:26:20 PM

06/13/2018 07:26:20 PM

"tasks.csv" looks like :











    date = pd.read_csv('date.csv')
    task = pd.read_csv('tasks.csv')
    model = Sequential()
    model.compile(loss="mean_squared_error", optimizer="adam"), task,  epochs=100,  batch_size=1,  verbose=1)

How can I forecast the result?


There are some issues with this code sample. Therea are lack of preprocessing, label encoding, target encoding and incorrect loss function. I briefly describe possible solutions, but for more information and examples you can read a tutorial about time-series and forecasting.

Adressing specific problem which generates this ValueError is: LSTM requires a three-dimensional input. The shape of it is (batch_size, input_length, dimension). So, it requires an input of some values at least (batch_size, 1, 1) - but date.shape is (10, 1). If you do

date = date.values.reshape((1, 10, 1)) 

- it will solve this one problem, but brings an avalanche of other problems:

date = date.values.reshape((1, 10, 1))

model = Sequential()
model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1)))
model.compile(loss="mean_squared_error", optimizer="adam"), task,  epochs=100,  batch_size=1,  verbose=1)

ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 10 target samples.

Unfortunately, there's no answers to other questions, because of a lack of information. But some general-purpose recommendations.

Preprocessing Unfortunately, you probably can't just reshape because the forecasting is little less complicated thing. You should choose some periond based on you will forecast next task. Good news, there is periodic measurements, but for each time there are several tasks, which maked the task harder to solve.

Features You should have a features to predict something. It's not clear what is feature this case, but perhaps not a date and time. Even the previous task could be a features, but you can't use just the task id, it requires some embedding, as it's not a continuous numeric value but a label.

Embedding There's a keras.layers.Embedding for embedding of something in keras.

If the number of tasks is 4 (1, 2, 3, 4) and the shape of the output vector is, you could use this way:

model = Sequential()
model.add(Embedding(4 + 1, 10, input_length=10))  # + 1 to deal with non-zero indexing
# ... the reso of the code is omitted

- the first argument is a number of embedded items, second is an output shape, and the latter is input length (10 is just an example value).

Label encoding Probably task labels just a labels, there's no reasonable distance or metric between them - i.e. you can't say 1 is closer to 2 than to 4 etc. That case mse is useless, but fortunately exists a probabilistic loss function named categorical cross-entropy which helps to predict a category of data.

To use it, you shoul binarize labels:

import numpy as np

def binarize(labels):
    label_map = dict(map(reversed, enumerate(np.unique(labels))))
    bin_labels = np.zeros((len(labels), len(label_map)))
    bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]]  = 1
    return bin_labels, label_map

binarized_task, label_map = binarize(task)
array([[0., 1., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [1., 0., 0., 0.],
        [0., 0., 0., 1.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]]
{1: 0, 2: 1, 3: 2, 4: 3}

- binarized labels and the collection of "task-is's position in binary labels". Of course, you should use cross-entropy loss in model with binarized labels. Also, the last layer should use softmax activation function (explained in tutorial about cross-entropy; shortly, you deal with a probabbility of a label, so, it should be sumed up to one, and softmax modifies previous layer values according to this requirement):

model.add(Dense(4, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam"), binarized_task, epochs=100, batch_size=1,  verbose=1)

"Complete", but, probably, meaningless example This example uses all the things listed above, but it doesn't pretend to be complete or useful - but, I hope, it is explanatory at least.

import datetime
import numpy as np
import pandas as pd
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM, Flatten, Embedding

# Define functions

def binarize(labels):
    Labels of shape (size,) to {0, 1} array of the shape (size, n_labels)
    label_map = dict(map(reversed, enumerate(np.unique(labels))))
    bin_labels = np.zeros((len(labels), len(label_map)))
    bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]]  = 1
    return bin_labels, label_map

def group_chunks(df, chunk_size):
    Group task date by periods, train on some columns and use lask ('Tasks') as the target. Function uses 'Tasks' as a features.
    chunks = []
    for i in range(0, len(df)-chunk_size):
        chunks.append(df.iloc[i:i + chunk_size]['Tasks'])  # slice period, append 
        chunks[-1].index = list(range(chunk_size))
    df_out = pd.concat(chunks, axis=1).T  
    df_out.index = df['Date'].iloc[:(len(df) - chunk_size)]
    df_out.columns = [i for i in df_out.columns[:-1]] + ['Tasks']
    return df_out

# I modify this date for simlicity - now it's a single entry for each datetime
date = pd.DataFrame({
    "Date" : [
        "06/13/2018 07:20:00 PM",
        "06/13/2018 07:20:01 PM",
        "06/13/2018 07:20:02 PM",
        "06/13/2018 07:20:03 PM",
        "06/13/2018 07:20:04 PM",
        "06/13/2018 07:20:05 PM",
        "06/13/2018 07:20:06 PM",
        "06/13/2018 07:20:07 PM",
        "06/13/2018 07:20:08 PM",
        "06/13/2018 07:20:09 PM"]

task = pd.DataFrame({"Tasks": [2, 1, 2, 1, 4, 2, 3, 2, 3, 4]})
date['Tasks'] = task['Tasks']
date['Date'] = date['Date'].map(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p"))  # formatting datetime as datetime

chunk_size = 4
df = group_chunks(date, chunk_size)
# print(df)
                     0  1  2  Tasks
2018-06-13 19:20:00  2  1  2      1
2018-06-13 19:20:01  1  2  1      4
2018-06-13 19:20:02  2  1  4      2
2018-06-13 19:20:03  1  4  2      3
2018-06-13 19:20:04  4  2  3      2
2018-06-13 19:20:05  2  3  2      3

# extract the train data and target
X = df[list(range(chunk_size-1))].values
y, label_map = binarize(df['Tasks'].values)

# Create a model, compile, fit
model = Sequential()
model.add(Embedding(len(np.unique(X))+1, 24, input_length=X.shape[-1]))
model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1)))
model.add(Dense(4, activation='softmax'))
model.compile(loss="categorical_crossentropy", optimizer="adam")
history =, y,  epochs=100,  batch_size=1,  verbose=1)
Epoch 1/100
6/6 [==============================] - 1s 168ms/step - loss: 1.3885
Epoch 2/100
6/6 [==============================] - 0s 5ms/step - loss: 1.3811
Epoch 3/100
6/6 [==============================] - 0s 5ms/step - loss: 1.3781

- etc. Works somehow, but I kinly advice one more time: read a toturial linked above (or any othe forecasting tutorial). Because, for example, I haven't covered a testing/validation area in this example.


The example in the link below has a training and validation set from time series data. There is no mention of a test set. Why isn't there one and what would it entail to have one for a dataset whose time series data is being generated on the fly in real time?

I have 3hrs of data collected at 1s interval. I would like to predict the next 30 min before it becomes available. What should be the train/validate/test split look like? Can test set be skipped?


It is never recommended to skip the test set. In the TensorFlow example, the purpose was to demonstrate how you can play with time series; you can test on the 'test set' just like you do with your validation, with the constraint that the test set is completely unknown: here we come to your second question.

With regard to the test set, in your use case, like you said, the test set is the data generated on the fly.

You can, of course, split your initial dataset into train/val/test. But the second test set which evidently coincides with your model 'live deployment' would be to predict on 'on-the-fly-generated-dataset' => this means you would feed the data real-time to your model.

The train-val-test split depends on how you want to create your model: how many time-steps you want to use(how many seconds to take into account when prediction the next step etc, how many variables you are trying to predict, how many time-steps ahead you want to predict(in your case 30 minutes would be 30*60 = 1800, since your dataset signals frequency is in seconds). It's a very broad question and refers more on how to create a dataset for time series analysis for multi-step prediction.


I have been working on time series forecasting and recently read about how the hybrid model of auto.arima and ann provide better/more accurate forecasting results. I have six time series data sets, the hybrid model work wonders for five of them but it gives weird results for the other.

I ran the model using the following to packages: library(forecast) library(forecastHybrid)

Here is the data:

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012           1  16  41  65  87 104 152 203 213 263
2013 299 325 388 412 409 442 447 421 435 448 447 443
2014 454 446 467 492 525


fit <- hybridModel(, model="an")

Forecast results for the next 5 periods:

forecast(fit, 5)

   Point       Forecast    Lo 80    Hi 80    Lo 95    Hi 95
Jun 2014       594.6594 519.2914 571.0163 505.6007 584.7070
Jul 2014       702.1626 528.7327 601.8827 509.3710 621.2444
Aug 2014       738.5732 540.6665 630.2566 516.9534 653.9697
Sep 2014       752.1329 553.8905 657.3403 526.5090 684.7218
Oct 2014       762.7481 567.9391 683.5994 537.3256 714.2129

You see how the point forecasts are outside of the 95% confidence interval. Does anybody know what this is happening and how I could fix it?

Any thoughts and insights are appreciated! Thanks in advance.


See the description of this issue here tl;dr nnetar models do not create prediction intervals, so these are not included in the ensemble prediction intervals. When the "forecast" package adds this behavior (on the road map for 2016), the prediction intervals and point forecasts will be consistent


I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.

Here is my dataset description after reshaping the train and test set.

print(train_x.shape, train_y.shape)

(2192, 15)
(1806, 14, 14) (1806, 7, 1)
(364, 15)

In above I have n_input=14, n_out=7.

Here is my lstm model description:

def build_model(train, n_input):
    # prepare data
    train_x, train_y = to_supervised(train, n_input)
    # define parameters
    verbose, epochs, batch_size = 2, 100, 16
    n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
    # reshape output into [samples, timesteps, features]
    train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
    # define model
    model = Sequential()
    model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
    model.add(LSTM(200, activation='relu', return_sequences=True))
    model.add(TimeDistributed(Dense(100, activation='relu')))
    model.compile(loss='mse', optimizer='adam')
    # fit network, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
    return model

On evaluating the model, I am getting the output as:

Epoch 98/100
 - 8s - loss: 64.6554
Epoch 99/100
 - 7s - loss: 64.4012
Epoch 100/100
 - 7s - loss: 63.9625

According to my understanding: (Please correct me if I am wrong)

Here my model accuracy is 63.9625 (by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.

Here are my questions:

  1. How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?

  2. Is my above-defined epoch, batch, n_input is correct for the model?

  3. How can I increase my model accuracy? Is the above dataset size is good enough for this model?

I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.


Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.

Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.

What you can do to increase your accuracy is: 1. Increase your dataset for the training. 2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model. 3. Try other algorithms.