Autoregressive (AR) models with Python examples

Autoregressive (AR) models are a subset of time series models, which can be used to predict future values based on previous observations. AR models use regression techniques and rely on autocorrelation in order to make accurate predictions. This blog post will provide Python code examples that demonstrate how you can implement an AR model for your own predictive analytics project. You will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be useful read. Note that time-series forecasting is one of the important areas of data science / machine learning. Here are some of the topics that will be covered in the post:

  • Autoregressive (AR) models concepts with examples
  • Alternative methods to AR models
  • Python code example for AR models
  • Learning References

Autoregressive (AR) Models concepts with Examples

Autoregressive (AR) modeling is one of the techniques used for time-series analysis. AR models are a very powerful tool in time series analysis, allowing us to forecast the future based on historical data. AR models can be used to model anything that has some degree of autocorrelation which means that there is a correlation between observations at adjacent time steps. The most common use case for this type of modeling is with stock market prices where the price today (t) is highly correlated with the price one day ago (t-1).

For beginners, time series analysis represents the class of problems where the dependent variable or response variable values do depend upon the value of the response variable measured in the past. The value that the response variable will take can be derived from what was the value on the earlier day, earlier week, earlier month, etc. From a machine learning terminologies perspective, time, can thus, be called an independent variable when training a model. Such data set is also termed time-series data.

Autoregressive modeling is training a regression model on the value of the response variable itself. Autoregressive is made of the word, Auto and Regressive which represents the linear regression on itself (auto). In the context of time-series forecasting, autoregressive modeling will mean creating the model where the response variable Y will depend upon the previous values of Y at a pre-determined constant time lag. The time lag can be daily (or 2, 3, 4… days), weekly, monthly, etc. A great way to explain this would be that if I were predicting what the stock price will be at 12 pm tomorrow based on the stock price today, then my model might have an auto part where each day affects next days value just like regular linear regression does but also has regressive features which mean there are different factors influencing changes over shorter spans such as days rather than weeks.

\(Y_t = \beta_0 + \beta_1*Y_{t-1} + error_t\)

In the above model, the value at the last time lag is taken. If the time lag is weekly, the \(Y_{t-1}\) will represent the value of Y of the last week. Such AR models where the value of response variable of just one-time lag is taken are called AR models of the first model or AR (1) models. Let’s understand with the simple example of refrigerator sales. The ask is to forecast sales on a particular day in the future. And, the data we have is historical sales data of every day for last 3 years. AR model of 1st order with a time lag of 1 week will consider the refrigerator sale of last week.

Let’s understand the AR model concept with another example and the following diagram.

autoregressive model - time series forecasting
Fig 1. Time Series Forecasting

The above diagram represents the residential power demand across different months from 2003 to 2010. The ask will be to use the data to forecast the power demand in the coming months. Taking time lag of 1 month, AR (1) model or AR model of 1st order will look like the following:

\(PowerDemand_t = \beta_0 + \beta_1*PowerDemandY_{t – 1} + error_t\)

AR models have the parameter termed as p. The parameter p represents the previous values of p number of time lags when training the model. The AR model of 2nd order will have the value of the response variable at any particular time depending upon the values of the last two lags. Thus, AR (2) model will look like the following:

\(Y_t = \beta_0 + \beta_1*Y_{t-1} + \beta_2*Y_{t-2} + error_t\)

Generalizing the above for p, the AR (p) model will look like the following:

\(Y_t = \beta_0 + \beta_1*Y_{t-1} + \beta_2*Y_{t-2} + … + \beta_p*Y_{t-p} + error_t\)

Alternatives Methods to AR Models

Here are some of the alternative time-series forecasting methods to AR modeling technique:

  • MA (Moving average)
  • ARMA (Autoregressive moving average)
  • ARIMA (Autoregressive integrated moving average)
  • SARIMA (Seasonal autoregressive integrated moving average)
  • VAR (Vector autoregression)
  • VARMA (Vector autoregression moving average)
  • SES (Simple exponential smoothing)

We will discuss the above time-series modeling technique in upcoming blog posts.

Python Code Example for AR Model

We will use statsmodels.tsa package to load ar_model.AR class which is used to train univariate autoregressive (AR) model of order p. Note that statsmodels.tsa contains model classes and functions that are useful for time series analysis. Basic models include univariate autoregressive models (AR), vector autoregressive models (VAR), and univariate autoregressive moving average models (ARMA). The following are some of the key steps which needed to be done for training the AR model:

  • Plot the time-series
  • Check the stationarity
  • Determine the parameter p or order of the AR model
  • Train the model

Here is the Python code example for the AR model trained using statsmodels.tsa.ar_model.AutoReg class.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#
# Load AutoReg class from statsmodels.tsa.ar_model module
#
from statsmodels.tsa.ar_model import AutoReg
# Load and plot the time-series data
#
url='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
df = pd.read_csv(url,sep=",")
df['Consumption'].plot()

Here is how the time-series plot will look like.

Time-series plot representing electricity consumption
Fig 2. Time-series plot representing electricity consumption

Before going ahead and training the AR model, the following will be needed to be found:

  • Stationarity of the time-series data: The stationarity of the data can be found using adfuller class of statsmodels.tsa.stattools module. The value of p-value is used to determine whether there is stationarity. If the value is less than 0.05, the stationarity exists.
  • Order of AR model to be trained: The order of AR model is determined by checking the partial autocorrelation plot. The plot_pacf method of statsmodels.graphics.tsaplots is used to plot.

The following code is used to check the stationarity and order of the AR model.

#
# Check for stationarity of the time-series data
# We will look for p-value. In case, p-value is less than 0.05, the time series 
# data can said to have stationarity
#
from statsmodels.tsa.stattools import adfuller
#
# Run the test
#
df_stationarityTest = adfuller(df['Consumption'], autolag='AIC')
#
# Check the value of p-value
#
print("P-value: ", df_stationarityTest[1])
#
# Next step is to find the order of AR model to be trained
# for this, we will plot partial autocorrelation plot to assess 
# the direct effect of past data on future data
#
from statsmodels.graphics.tsaplots import plot_pacf
pacf = plot_pacf(df['Consumption'], lags=25)

The following plot can be used to determine the order of AR model. You may note that a correlation value up to order 8 is high enough. Thus, we will train the AR model of order 8.

Time-series AR model - Partial Autocorrelation Plot
Fig 3. Time-series AR model – Partial Autocorrelation Plot

The next step is to train the model. Here is the code which can be used to train the model.

#
# Create training and test data
#
train_data = df['Consumption'][:len(df)-100]
test_data = df['Consumption'][len(df)-100:]
#
# Instantiate and fit the AR model with training data
#
ar_model = AutoReg(train_data, lags=8).fit()
#
# Print Summary
#
print(ar_model.summary())

Here is how the summary will look like:

Time-series AutoReg model summary
Fig 4: Time-series AutoReg model summary

Once the model is trained, the final step is to make the predictions and evaluate the predictions against the test data. This is the code that can be used to do the same.

#
# Make the predictions
#
pred = ar_model.predict(start=len(train_data), end=(len(df)-1), dynamic=False)
#
# Plot the prediction vs test data
#
from matplotlib import pyplot
pyplot.plot(pred)
pyplot.plot(test_data, color='red')

This is how the plot will look like:

Time-series AutoReg Model Prediction Plot
Fig 5. Time-series AutoReg Model Prediction Plot

Learning References

Here are some good learning references for auto-regressive models:

What are autoregressive models?

Autoregressive models are powerful tools in the data scientist’s toolbox for understanding how one variable may predict another. The examples we’ve provided should give you a starting point to implement autoregressive modeling into your own work or research projects, but if this is all new to you, reach out for help! Our team of experts is available to answer any questions about these concepts and provide training on implementing them.

Ajitesh Kumar
Follow me
Latest posts by Ajitesh Kumar (see all)

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin and Twitter.
Posted in Data Science, Machine Learning, Python. Tagged with , , .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.