## Hot questions for Using Neural networks in forecasting

Question:

##### Problem

I am trying to use 5 years of consecutive, historical data to forecast values for the following year.

##### Data Structure

My input data *input_04_08* looks like this where the first column is the day of the year (1 to 365) and the second column is the recorded input.

1,2 2,2 3,0 4,0 5,0

My output data *output_04_08* looks like this, a single column with the recorded output on that day of the year.

27.6 28.9 0 0 0

I then normalise the values between 0 and 1 so the first sample given to the network would look like

Number of training patterns: 1825 Input and output dimensions: 2 1 First sample (input, target): [ 0.00273973 0.04 ] [ 0.02185273]

##### Approach(s)

##### Feed Forward Network

I have implemented the following code in PyBrain

input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',') input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',') output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',') output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',') input_04_08 = input_04_08 / input_04_08.max(axis=0) input_09 = input_09 / input_09.max(axis=0) output_04_08 = output_04_08 / output_04_08.max(axis=0) output_09 = output_09 / output_09.max(axis=0) ds = SupervisedDataSet(2, 1) for x in range(0, 1825): ds.addSample(input_04_08[x], output_04_08[x]) n = FeedForwardNetwork() inLayer = LinearLayer(2) hiddenLayer = TanhLayer(25) outLayer = LinearLayer(1) n.addInputModule(inLayer) n.addModule(hiddenLayer) n.addOutputModule(outLayer) in_to_hidden = FullConnection(inLayer, hiddenLayer) hidden_to_out = FullConnection(hiddenLayer, outLayer) n.addConnection(in_to_hidden) n.addConnection(hidden_to_out) n.sortModules() trainer = BackpropTrainer(n, ds, learningrate=0.01, momentum=0.1) for epoch in range(0, 100000000): if epoch % 10000000 == 0: error = trainer.train() print 'Epoch: ', epoch print 'Error: ', error result = numpy.array([n.activate(x) for x in input_09])

and this gave me the following result with **final error of 0.00153840123381**

Admittedly, this looks good. However, having read more about LSTM (Long Short-Term Memory) neural networks, and there applicability to time series data, I am trying to build one.

##### LSTM Network

Below is my code

input_04_08 = numpy.loadtxt('./data/input_04_08.csv', delimiter=',') input_09 = numpy.loadtxt('./data/input_09.csv', delimiter=',') output_04_08 = numpy.loadtxt('./data/output_04_08.csv', delimiter=',') output_09 = numpy.loadtxt('./data/output_09.csv', delimiter=',') input_04_08 = input_04_08 / input_04_08.max(axis=0) input_09 = input_09 / input_09.max(axis=0) output_04_08 = output_04_08 / output_04_08.max(axis=0) output_09 = output_09 / output_09.max(axis=0) ds = SequentialDataSet(2, 1) for x in range(0, 1825): ds.newSequence() ds.appendLinked(input_04_08[x], output_04_08[x]) fnn = buildNetwork( ds.indim, 25, ds.outdim, hiddenclass=LSTMLayer, bias=True, recurrent=True) trainer = BackpropTrainer(fnn, ds, learningrate=0.01, momentum=0.1) for epoch in range(0, 10000000): if epoch % 100000 == 0: error = trainer.train() print 'Epoch: ', epoch print 'Error: ', error result = numpy.array([fnn.activate(x) for x in input_09])

This results in a **final error of 0.000939719502501**, but this time, when I feed the test data, the output plot looks terrible.

##### Possible Problems

I have looked around here at pretty much all the PyBrain questions, these stood out, but haven't helped me figure things out

- Training an LSTM neural network to forecast time series in pybrain, python
- Time Series Prediction via Neural Networks
- Time series forecasting (eventually with python)

I have read a few blog posts, these helped further my understanding a bit, but obviously not enough

Naturally, I have also gone through the PyBrain docs but couldn't find much to help with the sequential dataset bar here.

Any ideas/tips/direction would be welcome.

Answer:

I think what happened here is that you tried to assign hyperparameter values according to some rule-of-thumb which worked for the first case, but didn't for the second.

1) The error estimate that you're looking at is an *optimistic* prediction error estimate of the training set. The *actual* prediction error is high, but because you didn't test your model on unseen data there's no way of knowing it. Elements of statistical learning gives a nice description of this phenomenon. I would highly recommend this book. You can get it online for free.

2) To get an estimator with low prediction error you need to perform hyperparameter tuning. E.g. the number of hidden nodes, learning rate and momentum should be varied and tested on the unseen data to know which combination leads to the lowest prediction error. scikit-learn has `GridSearchCV`

and `RandomizedSearchCV`

to do just that, but they only work on sklearn`s estimators. You can roll your own estimator though, which is described in the documentation. Personally, I think that model selection and model evaluation are two different tasks. For the first one you can just run a single GridSearchCV or RandomizedSearchCV and get a set of best hyperparameters for your task. For model evaluation you need to run a more sophisticated analysis, such as nested cross-validation or even repeated nested cross-validation if you want an even more accurate estimate.

3) I don't know much about LSTM networks, but I see that in the first example you assigned 25 hidden nodes, but for LSTM you only provide 5. Maybe it's not enough to learn the pattern. You could also drop the output bias as done in the example.

P.S. I think this question actually belongs to http://stats.stackexchange.com, where you're likely to get a more detailed answer to your problem.

**EDIT**: I just noticed that you're teaching the model for 10 million epochs! I think it's a lot and probably part of the overfitting problem. I think it's a good idea to implement early stopping, i.e. stop training if some predefined error is achieved.

Question:

I spent some time to understand `input_shape = (batch_size, steps, input_dim)`

in Keras Conv1D, but I cannot make any progress so far.

To be more specific, I have 2 datasets.

*Dataset-1*: Solar energy production of 24 hours of each day in one year, so the size of my dataset is (364,24), days are in row and consumption is in columns.

Example of 2 days:

day-1: [0 0 0 0 0 0 0 1.611 5.791 8.229 9.907 9.649 8.401 6.266 4.728 2.231 0.306 0.013 0 0 0 0 0 0] day-2: [0 0 0 0 0 0 0 1.732 5.839 9.909 12.593 14.242 12.744 9.596 5.808 2.019 0.241 0 0 0 0 0 0 0]`

What I want to do with CNN is using 6 days' data to predict the 7th day. for that reason, I divided my dataset like so:

xtrain = dataset[0:6,0:24] # takes 24 hour of 6 days ytrain = dataset[6,0:24] # takes 24 hour of 7th day xtest = dataset[1:7,0:24] # takes 24 hours for 6 days (day2 to day7) to predict day 8

To be compatible with Keras' input shape, I reshaped the training data as follows:

xtrain = xtrain.reshape(6,1,24)

Number of Samples: 6, Time Dimension: 1, input_dimension:24

Is this correct thinking?

model.add(Conv1D(**filters?**,kernel_size=4,activation='relu', **input_shape=?**))

In my second dataset:

Training Data: Xtrain: Day-1 Hour-1 to Hour-24, Day-2 Hour-1 to Hour-24 ... Day-6 Hour-1 to Hour-24 Ytrain: Day-7 Hour-1 to Hour-24

I have created a new dataset which takes 24 hours of a day in the rows and 7 days in the columns, so it is a (8616,7) matrix.

hour-1 day-1, day-2 ... day-7 hour-2 day-1, day-2 ... day-7 ... hour-24 day-1, day-2 ... day-7 ... hour-1 day-2, day-3 ... day-8 hour-2 day-2, day-3 ... day-8 ... hour-24 day-2, day-3 ... day-8 ... hour-1 day-359, day-360 ... day-365 hour-2 day-359, day-360 ... day-365 ... hour-24 day-359, day-360 ... day-365

Keras Code:

xtrain = dataset[0:24,0:6] # takes 24 hour for 6 days ytrain = dataset[24:48,6] # takes 24 hour of 7th day xtest = dataset[24:48,0:6] # takes 24 hours for 6 days (day2 to day7) to predict day 7 xtrain = xtrain[newaxis,:, :] ytrain = ytrain.reshape(1,24)

I really don't understand what filters and input_shape should be.

Answer:

You should reformat your dataset in a structure like this:

[365,6,24]

The first dimension sets the day.

The second dimensions set the timestep (the progression over the 6 days), so you should take each day of the original dataset 6:365 and then copy the past 6 days 24h.

The third dimension is each hour

Assuming that you have your original dataset of [1:365,1:24] ordered:

xtrain=np.array(np.tile(xtrain,[6 1]))

xtrain=np.reshape(xtrain,(365,6,24))

And then now you have the required 3D format for conv1d (batch, timesteps, channels)

Question:

I use the `mlp`

and `elm`

functions from the `nnfor`

library for forecasting non-stationary time series. Both of them give different number of nodes in input and hidden layers. I am interested in how they choose the number of nodes in each layer and it would be great to understand the generalization error change the way it works in that functions.

Answer:

The number of hidden nodes chosen by the `mlp`

function depends on the value of the `hd.auto.type`

parameter:

- "set" fixes hd=5.
- "valid" uses a 20% validation set (randomly) sampled to find the best number of hidden nodes.
- "cv" uses 5-fold cross-validation.
- "elm" uses ELM to estimate the number of hidden nodes (experimental).

The number of hidden nodes tried for the "valid", "cv" and "elm" parameter values range from 1 to `max(2, min(dim(X)[2] + 2, length(Y) - 2))`

. These hidden nodes are limited to a single layer.

The "cv" and "valid" approaches use the minimum of the mean square error to find the number of hidden nodes.

As far as I can tell from the `auto.hd.elm`

function in the source code, the "elm" approach uses the median value of the number of significant model coefficients to choose the number of hidden nodes. Hope that makes sense to you!

The `elm`

function uses `min(100 - 60*(type=="step" | type=="lm"),max(4, length(Y) - 2 - as.numeric(direct)*length(X[1,])))`

to determine the number of hidden nodes. Where `type`

is estimation used for output layer weights and `direct`

is presence of direct input-output connections.

The number of input nodes depends on seasonality and lags.

Generalization error can be approximated using cross-validation. To be clear, this cross-validation would have to be done separately from any validation used to find the number of hidden nodes.

The nnfor package author has an introductory blog post which may be worth checking: http://kourentzes.com/forecasting/2017/02/10/forecasting-time-series-with-neural-networks-in-r/

Question:

I'm trying to build an LSTM model, data consists of date_time & some numeric values. While fitting the model, its getting

"ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (10, 1)" error.

Sample data: "date.csv" looks like:

Date 06/13/2018 07:20:04 PM 06/13/2018 07:20:04 PM 06/13/2018 07:20:04 PM 06/13/2018 07:22:12 PM 06/13/2018 07:22:12 PM 06/13/2018 07:22:12 PM 06/13/2018 07:26:20 PM 06/13/2018 07:26:20 PM 06/13/2018 07:26:20 PM 06/13/2018 07:26:20 PM

"tasks.csv" looks like :

Tasks 2 1 2 1 4 2 3 2 3 4

date = pd.read_csv('date.csv') task = pd.read_csv('tasks.csv') model = Sequential() model.add(LSTM(24,return_sequences=True,input_shape=(date.shape[0],1))) model.add(Dense(1)) model.compile(loss="mean_squared_error", optimizer="adam") model.fit(date, task, epochs=100, batch_size=1, verbose=1)

How can I forecast the result?

Answer:

There are some issues with this code sample. Therea are lack of preprocessing, label encoding, target encoding and incorrect loss function. I briefly describe possible solutions, but for more information and examples you can read a tutorial about time-series and forecasting.

Adressing specific problem which generates this ValueError is: `LSTM`

requires a three-dimensional input. The shape of it is `(batch_size, input_length, dimension)`

. So, it requires an input of some values at least `(batch_size, 1, 1)`

- but `date.shape`

is `(10, 1)`

. If you do

date = date.values.reshape((1, 10, 1))

- it will solve this one problem, but brings an avalanche of other problems:

date = date.values.reshape((1, 10, 1)) model = Sequential() model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1))) print(model.layers[-1].output_shape) model.add(Dense(1)) model.compile(loss="mean_squared_error", optimizer="adam") model.fit(date, task, epochs=100, batch_size=1, verbose=1)

ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 10 target samples.

Unfortunately, there's no answers to other questions, because of a lack of information. But some general-purpose recommendations.

**Preprocessing**
Unfortunately, you probably can't just reshape because the forecasting is little less complicated thing. You should choose some periond based on you will forecast next task. Good news, there is periodic measurements, but for each time there are several tasks, which maked the task harder to solve.

**Features**
You should have a features to predict something. It's not clear what is *feature* this case, but perhaps not a date and time. Even the previous task could be a features, but you can't use just the task id, it requires some *embedding*, as it's not a continuous numeric value but a label.

**Embedding**
There's a `keras.layers.Embedding`

for embedding of something in keras.

If the number of tasks is 4 (1, 2, 3, 4) and the shape of the output vector is, you could use this way:

model = Sequential() model.add(Embedding(4 + 1, 10, input_length=10)) # + 1 to deal with non-zero indexing # ... the reso of the code is omitted

- the first argument is a number of embedded items, second is an output shape, and the latter is input length (10 is just an example value).

**Label encoding**
Probably task labels just a *labels*, there's no reasonable distance or metric between them - i.e. you can't say 1 is closer to 2 than to 4 etc. That case `mse`

is useless, but fortunately exists a probabilistic loss function named categorical cross-entropy which helps to predict a category of data.

To use it, you shoul binarize labels:

import numpy as np def binarize(labels): label_map = dict(map(reversed, enumerate(np.unique(labels)))) bin_labels = np.zeros((len(labels), len(label_map))) bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]] = 1 return bin_labels, label_map binarized_task, label_map = binarize(task) binarized_task Out: array([[0., 1., 0., 0.], [1., 0., 0., 0.], [0., 1., 0., 0.], [1., 0., 0., 0.], [0., 0., 0., 1.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]] label_map Out: {1: 0, 2: 1, 3: 2, 4: 3}

- binarized labels and the collection of "task-is's position in binary labels".
Of course, you should use cross-entropy loss in model with binarized labels. Also, the last layer should use `softmax`

activation function (explained in tutorial about cross-entropy; shortly, you deal with a *probabbility* of a label, so, it should be sumed up to one, and `softmax`

modifies previous layer values according to this requirement):

model.add(Dense(4, activation='softmax')) model.compile(loss="categorical_crossentropy", optimizer="adam") model.fit(date, binarized_task, epochs=100, batch_size=1, verbose=1)

**"Complete", but, probably, meaningless example**
This example uses all the things listed above, but it doesn't pretend to be complete or useful - but, I hope, it is explanatory at least.

import datetime import numpy as np import pandas as pd import keras from keras.models import Sequential from keras.layers import Dense, LSTM, Flatten, Embedding # Define functions def binarize(labels): """ Labels of shape (size,) to {0, 1} array of the shape (size, n_labels) """ label_map = dict(map(reversed, enumerate(np.unique(labels)))) bin_labels = np.zeros((len(labels), len(label_map))) bin_labels[np.arange(len(labels)), [label_map[label] for label in labels]] = 1 return bin_labels, label_map def group_chunks(df, chunk_size): """ Group task date by periods, train on some columns and use lask ('Tasks') as the target. Function uses 'Tasks' as a features. """ chunks = [] for i in range(0, len(df)-chunk_size): chunks.append(df.iloc[i:i + chunk_size]['Tasks']) # slice period, append chunks[-1].index = list(range(chunk_size)) df_out = pd.concat(chunks, axis=1).T df_out.index = df['Date'].iloc[:(len(df) - chunk_size)] df_out.columns = [i for i in df_out.columns[:-1]] + ['Tasks'] return df_out # I modify this date for simlicity - now it's a single entry for each datetime date = pd.DataFrame({ "Date" : [ "06/13/2018 07:20:00 PM", "06/13/2018 07:20:01 PM", "06/13/2018 07:20:02 PM", "06/13/2018 07:20:03 PM", "06/13/2018 07:20:04 PM", "06/13/2018 07:20:05 PM", "06/13/2018 07:20:06 PM", "06/13/2018 07:20:07 PM", "06/13/2018 07:20:08 PM", "06/13/2018 07:20:09 PM"] }) task = pd.DataFrame({"Tasks": [2, 1, 2, 1, 4, 2, 3, 2, 3, 4]}) date['Tasks'] = task['Tasks'] date['Date'] = date['Date'].map(lambda x: datetime.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")) # formatting datetime as datetime chunk_size = 4 df = group_chunks(date, chunk_size) # print(df) """ 0 1 2 Tasks Date 2018-06-13 19:20:00 2 1 2 1 2018-06-13 19:20:01 1 2 1 4 2018-06-13 19:20:02 2 1 4 2 2018-06-13 19:20:03 1 4 2 3 2018-06-13 19:20:04 4 2 3 2 2018-06-13 19:20:05 2 3 2 3 """ # extract the train data and target X = df[list(range(chunk_size-1))].values y, label_map = binarize(df['Tasks'].values) # Create a model, compile, fit model = Sequential() model.add(Embedding(len(np.unique(X))+1, 24, input_length=X.shape[-1])) model.add(LSTM(24, return_sequences=True, input_shape=(date.shape[1], 1))) model.add(Flatten()) model.add(Dense(4, activation='softmax')) model.compile(loss="categorical_crossentropy", optimizer="adam") history = model.fit(X, y, epochs=100, batch_size=1, verbose=1) Out: Epoch 1/100 6/6 [==============================] - 1s 168ms/step - loss: 1.3885 Epoch 2/100 6/6 [==============================] - 0s 5ms/step - loss: 1.3811 Epoch 3/100 6/6 [==============================] - 0s 5ms/step - loss: 1.3781 ...

- etc. Works somehow, but I kinly advice one more time: read a toturial linked above (or any othe forecasting tutorial). Because, for example, I haven't covered a testing/validation area in this example.

Question:

The example in the link below has a training and validation set from time series data. There is no mention of a test set. Why isn't there one and what would it entail to have one for a dataset whose time series data is being generated on the fly in real time?

I have 3hrs of data collected at 1s interval. I would like to predict the next 30 min before it becomes available. What should be the train/validate/test split look like? Can test set be skipped?

https://www.tensorflow.org/tutorials/structured_data/time_series

Answer:

It is **never** recommended to skip the test set. In the TensorFlow example, the purpose was to demonstrate how you can play with time series; you can test on the 'test set' just like you do with your validation, with the constraint that the test set is completely unknown: here we come to your second question.

With regard to the test set, in your use case, like you said, the test set is the data generated on the fly.

You can, of course, split your initial dataset into train/val/test. But the second test set which evidently coincides with your model 'live deployment' would be to predict on 'on-the-fly-generated-dataset' => this means you would feed the data real-time to your model.

The train-val-test split depends on how you want to create your model: how many time-steps you want to use(how many seconds to take into account when prediction the next step etc, how many variables you are trying to predict, how many time-steps ahead you want to predict(in your case 30 minutes would be 30*60 = 1800, since your dataset signals frequency is in seconds). It's a very broad question and refers more on how to create a dataset for time series analysis for multi-step prediction.

Question:

I have been working on time series forecasting and recently read about how the hybrid model of auto.arima and ann provide better/more accurate forecasting results. I have six time series data sets, the hybrid model work wonders for five of them but it gives weird results for the other.

I ran the model using the following to packages:
```
library(forecast)
library(forecastHybrid)
```

Here is the data:

ts.data

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2012 1 16 41 65 87 104 152 203 213 263 2013 299 325 388 412 409 442 447 421 435 448 447 443 2014 454 446 467 492 525

Model:

fit <- hybridModel(ts.data, model="an")

Forecast results for the next 5 periods:

forecast(fit, 5)

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 Jun 2014 594.6594 519.2914 571.0163 505.6007 584.7070 Jul 2014 702.1626 528.7327 601.8827 509.3710 621.2444 Aug 2014 738.5732 540.6665 630.2566 516.9534 653.9697 Sep 2014 752.1329 553.8905 657.3403 526.5090 684.7218 Oct 2014 762.7481 567.9391 683.5994 537.3256 714.2129

You see how the point forecasts are outside of the 95% confidence interval. Does anybody know what this is happening and how I could fix it?

Any thoughts and insights are appreciated! Thanks in advance.

Answer:

See the description of this issue here
tl;dr `nnetar`

models do not create prediction intervals, so these are not included in the ensemble prediction intervals. When the "forecast" package adds this behavior (on the road map for 2016), the prediction intervals and point forecasts will be consistent

Question:

I am new to machine learning and lstm. I am referring this link LSTM for multistep forecasting for Encoder-Decoder LSTM Model With Multivariate Input section.

Here is my dataset description after reshaping the train and test set.

print(dataset.shape) print(train_x.shape, train_y.shape) print((test.shape) (2192, 15) (1806, 14, 14) (1806, 7, 1) (364, 15)

In above I have `n_input=14, n_out=7`

.

Here is my lstm model description:

def build_model(train, n_input): # prepare data train_x, train_y = to_supervised(train, n_input) # define parameters verbose, epochs, batch_size = 2, 100, 16 n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1] # reshape output into [samples, timesteps, features] train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1)) # define model model = Sequential() model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features))) model.add(RepeatVector(n_outputs)) model.add(LSTM(200, activation='relu', return_sequences=True)) model.add(TimeDistributed(Dense(100, activation='relu'))) model.add(TimeDistributed(Dense(1))) model.compile(loss='mse', optimizer='adam') # fit network model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose) return model

On evaluating the model, I am getting the output as:

Epoch 98/100 - 8s - loss: 64.6554 Epoch 99/100 - 7s - loss: 64.4012 Epoch 100/100 - 7s - loss: 63.9625

According to my understanding: (Please correct me if I am wrong)

`Here my model accuracy is 63.9625`

(by seeing the last epoch 100). Also, this is not stable since there is a gap between epoch 99 and epoch 100.

Here are my questions:

How epoch and batch size above defined is related to gaining model accuracy? How its increment and decrement affect model accuracy?

Is my above-defined epoch, batch, n_input is correct for the model?

How can I increase my model accuracy? Is the above dataset size is good enough for this model?

I am not able to link all this parameter and kindly help me in understanding how to achieve more accuracy by the above factor.

Answer:

Having a very large epoch size will not necessarily improve your accuracy. Epoch sizes can increase the accuracy up to a certain limit beyond which you begin to overfit your model. Having a very low one will also result in underfitting. See this. So looking at the huge difference between epoch 99 and epoch 100, you can already tell that you are overfitting the model. As a rule of thumb, when you notice the accuracy stops increasing, that is the ideal number of epochs you should have usually between 1 and 10. 100 seems too much already.

Batch size does not affect your accuracy. This is just used to control the speed or performance based on the memory in your GPU. If you have huge memory, you can have a huge batch size so training will be faster.

What you can do to increase your accuracy is: 1. Increase your dataset for the training. 2. Try using Convolutional Networks instead. Find more on convolutional networks from this youtube channel or in a nutshell, CNN's help you identify what features to focus on in training your model. 3. Try other algorithms.