# Autoregressive (AR) models with Python examples

Autoregressive (AR) models are a subset of time series models, which can be used to predict future values based on previous observations. AR models use regression techniques and rely on autocorrelation in order to make accurate predictions. This blog post will provide Python code examples that demonstrate how you can implement an AR model for your own predictive analytics project. You will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be a useful read. Note that time-series forecasting is one of the important areas of data science/machine learning

For beginners, time-series forecasting is the process of using a model to predict future values based on previously observed values. Time-series data is a sequence of data points, typically ordered in time. Forecasting models usually make predictions at regular intervals, such as hourly, daily, or weekly. Machine learning can be used to develop time-series forecasting models. This type of model is trained on past data and can be used to make predictions about future events. Time series forecasting is a valuable tool for businesses that can help them to make decisions about future production, staffing, and inventory levels. It can also be used to predict consumer demand and trends.

One of the popular examples of time-series forecasting is cash forecasting. Cash forecasting is a time-series forecasting technique that is used to predict an organization’s future cash flow. This is important because it can help organizations make sure they have enough cash on hand to meet their obligations. Cash forecasting typically involves creating a model that projects future cash inflows and outflows based on past data. This model can then be used to make decisions about how much cash the organization should have on hand at any given time.

## Autoregressive (AR) Models concepts with Examples

Autoregressive (AR) modeling is one of the techniques used for time-series analysis. An autoregressive model is a time-series model that describes how a particular variable’s past values influence its current value. In other words, an AR model attempts to predict the next value in a series by incorporating the most recent past values and using them as input data. Autoregressive models are based on the idea that past events can help us predict future events. For example, if we know that the stock market has been going up for the past few days, we might expect it to continue going up in the future. Or, if we know that there has been a lot of rain lately, we might expect more rain in the future.

Autoregressive modeling is training a regression model on the value of the response variable itself. Autoregressive is made of the word, Auto and Regressive which represents the linear regression on itself (auto). In the context of time-series forecasting, autoregressive modeling will mean creating the model where the response variable Y will depend upon the previous values of Y at a pre-determined constant time lag. The time lag can be daily (or 2, 3, 4… days), weekly, monthly, etc. A great way to explain this would be that if I were predicting what the stock price will be at 12 pm tomorrow based on the stock price today, then my model might have an auto part where each day affects the next day’s value just like regular linear regression does but also has regressive features which mean there are different factors influencing changes over shorter spans such as days rather than weeks. AR models can be used to model anything that has some degree of autocorrelation which means that there is a correlation between observations at adjacent time steps. The most common use case for this type of modeling is with stock market prices where the price today (t) is highly correlated with the price one day ago (t-1).

$$Y_t = \beta_0 + \beta_1*Y_{t-1} + error_t$$

In the above model, the value at the last time lag is taken. If the time lag is weekly, the $$Y_{t-1}$$ will represent the value of Y of the last week. Such AR models where the value of response variable of just one-time lag is taken are called AR models of the first model or AR (1) models. Let’s understand with the simple example of refrigerator sales. The ask is to forecast sales on a particular day in the future. And, the data we have is historical sales data of every day for last 3 years. AR model of 1st order with a time lag of 1 week will consider the refrigerator sale of last week.

Let’s understand the AR model concept with another example and the following diagram.

The above diagram represents the residential power demand across different months from 2003 to 2010. The ask will be to use the data to forecast the power demand in the coming months. Taking time lag of 1 month, AR (1) model or AR model of 1st order will look like the following:

$$PowerDemand_t = \beta_0 + \beta_1*PowerDemandY_{t – 1} + error_t$$

AR models have the parameter termed as p. The parameter p represents the previous values of p number of time lags when training the model. The AR model of 2nd order will have the value of the response variable at any particular time depending upon the values of the last two lags. Thus, AR (2) model will look like the following:

$$Y_t = \beta_0 + \beta_1*Y_{t-1} + \beta_2*Y_{t-2} + error_t$$

Generalizing the above for p, the AR (p) model will look like the following:

$$Y_t = \beta_0 + \beta_1*Y_{t-1} + \beta_2*Y_{t-2} + … + \beta_p*Y_{t-p} + error_t$$

## Alternatives Methods to AR Models

Here are some of the alternative time-series forecasting methods to the AR modeling technique:

• MA (Moving average)
• ARMA (Autoregressive moving average)
• ARIMA (Autoregressive integrated moving average)
• SARIMA (Seasonal autoregressive integrated moving average)
• VAR (Vector autoregression)
• VARMA (Vector autoregression moving average)
• SES (Simple exponential smoothing)

We will discuss the above time-series modeling technique in upcoming blog posts.

## Python Code Example for AR Model

We will use statsmodels.tsa package to load ar_model.AR class which is used to train the univariate autoregressive (AR) model of order p. Note that statsmodels.tsa contains model classes and functions that are useful for time series analysis. Basic models include univariate autoregressive models (AR), vector autoregressive models (VAR), and univariate autoregressive moving average models (ARMA). The following are some of the key steps which needed to be done for training the AR model:

• Plot the time-series
• Check the stationarity
• Determine the parameter p or order of the AR model
• Train the model

Here is the Python code example for the AR model trained using statsmodels.tsa.ar_model.AutoReg class.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#
# Load AutoReg class from statsmodels.tsa.ar_model module
#
from statsmodels.tsa.ar_model import AutoReg
# Load and plot the time-series data
#
url='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
df['Consumption'].plot()


Here is how the time-series plot will look like.

Before going ahead and training the AR model, the following will be needed to be found:

• Stationarity of the time-series data: The stationarity of the data can be found using adfuller class of statsmodels.tsa.stattools module. The value of p-value is used to determine whether there is stationarity. If the value is less than 0.05, the stationarity exists.
• Order of AR model to be trained: The order of AR model is determined by checking the partial autocorrelation plot. The plot_pacf method of statsmodels.graphics.tsaplots is used to plot.

The following code is used to check the stationarity and order of the AR model.

#
# Check for stationarity of the time-series data
# We will look for p-value. In case, p-value is less than 0.05, the time series
# data can said to have stationarity
#
#
# Run the test
#
#
# Check the value of p-value
#
print("P-value: ", df_stationarityTest[1])
#
# Next step is to find the order of AR model to be trained
# for this, we will plot partial autocorrelation plot to assess
# the direct effect of past data on future data
#
from statsmodels.graphics.tsaplots import plot_pacf
pacf = plot_pacf(df['Consumption'], lags=25)


The following plot can be used to determine the order of AR model. You may note that a correlation value up to order 8 is high enough. Thus, we will train the AR model of order 8.

The next step is to train the model. Here is the code which can be used to train the model.

#
# Create training and test data
#
train_data = df['Consumption'][:len(df)-100]
test_data = df['Consumption'][len(df)-100:]
#
# Instantiate and fit the AR model with training data
#
ar_model = AutoReg(train_data, lags=8).fit()
#
# Print Summary
#
print(ar_model.summary())


Here is how the summary will look like:

Once the model is trained, the final step is to make the predictions and evaluate the predictions against the test data. This is the code that can be used to do the same.

#
# Make the predictions
#
pred = ar_model.predict(start=len(train_data), end=(len(df)-1), dynamic=False)
#
# Plot the prediction vs test data
#
from matplotlib import pyplot
pyplot.plot(pred)
pyplot.plot(test_data, color='red')


This is how the plot will look like:

## Learning References

Here are some good learning references for auto-regressive models:

Autoregressive models are powerful tools in the data scientist’s toolbox for understanding how one variable may predict another. The examples we’ve provided should give you a starting point to implement autoregressive modeling into your own work or research projects, but if this is all new to you, reach out for help!