best way to train a regression model given time series data

time series regression python
time series regression models
time series analysis
time series forecasting
time series regression r
best regression model for time series
how to model time series data with linear regression
linear regression forecasting formula

Given data from week 1 and week 2, I am trying to train a model to predict on week 3 data.

the target label is called target.

I am confused about what the correct features should be used to train the model given this problem looks at a user historical action to predict their future action

train data

id,date,week_day,target
1,2019-01-01,1,10
1,2019-01-02,2,6
1,2019-01-03,3,7
2,2019-01-01,1,8
2,2019-01-02,1,5
2,2019-01-03,1,4

test data (See future date)

id,date,week_day,target
1,2019-01-10,1,15
1,2019-01-11,2,13
1,2019-01-12,3,8
2,2019-01-10,1,7
2,2019-01-11,1,7
2,2019-01-12,1,4

1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.

2) i plan to drop the date field

It looks like your problem can be seen as time series forecast. You have seasonality in your data. Instead of performing regression, you can try algorithm such as sarima

How To Model Time Series Data With Linear Regression, One example is when there is an outlier, the 'best' regression line calculated according to OLS obviously does not fit the observed data well. A  The basic idea is to predict future values of time series as weighted average of past observations, where weights decrease exponentially with time: yt=a yt-1+a(1-a) yt-2+a(1-a)2 yt-3+…, where a(0,1)is smoothing parameter which should be estimated.

1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.

As I see you have two types of dates for the same id (in both train and test sets). So, if this id represents something related to the target - keep it. Otherwise, drop it.

2) i plan to drop the date field

And you will lose year, months, week number, day number, holiday day mark as possible features.

In addition to SARIMA I can advise to try to fit some regression model here. Sometimes they work in time-series-like tasks.

Time Series Machine Learning Regression Framework, Time! One of the most challenging concepts in the Universe. Building a time series forecasting pipeline to predict weekly sales transaction For this method, we train on n-data points and validate the prediction on the next n-data points, Although this approach is possible, it might not the best solution. After confirmation of stationarity of the series, I can continue with the model. I ensure that I split my data into a training and testing set. The testing set is not used in the modeling process and will be used to evaluate the performance of the selected model on unseen data. To select the relevant time series model, I built the ACF and PACF

Your data has way too less features, You can try multiple models like Sarima as suggested by Pierre, but with only those features you might struggle, I would suggest you to try and plot a correlation matrix and see if there is any co-relation between Inputs and Outputs, if there isn't no model can help you, if there is a co-relation between features, then only a model will be able to learn that co-relation and generalize.

This link can be helpful if you don't know how to plot a co-relation matrix https://seaborn.pydata.org/examples/many_pairwise_correlations.html

This link can help you make sense of co-relation matrix if you are not familiar with them https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/

If you are unable to understand something from the links, feel free to comment.

5.6 Forecasting with regression, t , … , x k , t for t=1,…,T t = 1 , … , T returned the fitted (training-sample) values of y y . When using regression models for time series data, we need to distinguish To obtain these we can use one of the simple methods introduced in Section 3.1 models to better incorporate the rich dynamics observed in time series are  For time series where the value of the response is more stable (a.k.a stationary), this method can sometimes perfoms better than a ML algorithm surprisingly. In this case, the zig-zag of the data is notorious, leading to a poor predicting power. Multiple Linear Regression. Our next approach will be to build a multiple linear regression model

Chapter 5 Time series regression models, In this chapter we discuss regression models. The basic concept is that we forecast the time series of interest y y assuming that it has a linear relationship with  Query Google Trends Explore and Decompose the Series Model the Linear Relationship Accounting for Autocorrelation Summary A little over a month ago Rob Hyndman finished the 2nd edition of his open source book Forecasting: Principles and Practice. Take a look, it’s a fantastic introduction and companion to applied time series modeling using R. It made me I rediscover the tslm()-function of

How To Backtest Machine Learning Models for Time Series , How do we know how good a given model is? In applied machine learning, we often split our data into a train and a test set: the training These methods cannot be directly used with time series data. (I know, there are better alternatives for panel data like regression with fixed effects, but in my case,  The workflow includes preparing a data set, fitting a linear regression model, evaluating and improving the fitted model, and predicting response values for new predictor data. The example also describes how to fit and evaluate a linear regression model for tall arrays.

Time Series Forecasting as Supervised Learning, Discover how to prepare and visualize time series data and develop Regression: A regression problem is when the output variable is a real This will give us 3 input features and one output value to predict for each training pattern. It is also a good example to show the burden on the input variables. Train and validation data. You can specify separate train and validation sets directly in the AutoMLConfig constructor.. Rolling Origin Cross Validation. For time series forecasting Rolling Origin Cross Validation (ROCV) is used to split time series in a temporally consistent way.

Comments
  • Given that I need to predict given Id , date, should I use a multivariate time series instead of sarima
  • You can use an extension of sarima called sarimax where the 'X' stands for exogenous regressors. Then, use your Id as exogenous parameter. Tell me if it works. Multivariate time series should also work.
  • I think my problem is a little different, given the dates in the train data resets to the beginning for each id. Should i be building a model for each of the id?
  • i asked a separate question here: stackoverflow.com/questions/54411958/… maybe you can have a look?
  • Thanks! i tried regression model first before time series so now i am trying a time series too. And yes it make sense that i should keep id and date then