In this post, you will learn about the concepts of **autoregressive (AR) models **with the help of **Python **code examples. If you are starting on **time-series forecasting, **this would be useful read. Note that time-series forecasting is one of the important areas of data science / machine learning. Here are some of the topics that will be covered in the post:

- Autoregressive (AR) models concepts with examples
- Alternative methods to AR models
- Python code example for AR models
- Learning References

## Autoregressive (AR) Models concepts with Examples

**Autoregressive (AR) modeling** is one of the technique used for **time-series** analysis. For the beginners, time series analysis represents the class of problems where the **dependent variable** or response variable values do **depend upon the value of the response variable** measured in the past. The value that the response variable will take can be derived from what was the value on the earlier day, earlier week, earlier month etc. From machine learning terminologies perspective, the **time**, can thus, be called as **independent variable** when training a model. Such data set is also termed as **time-series data.**

Autoregressive modeling is training a regression model on the value of response variable itself. Autoregressive is made of the word, **Auto** and **Regressive** which represents the linear regression on itself (**auto**). In context of time-series forecasting, autoregressive modeling will mean creating the model where the response variable Y will depend upon the **previous values of Y** at a pre-determined **constant time lag**. The time lag can be daily (or 2, 3, 4… days), weekly, monthly etc.

In the above model, the value at the last time lag is taken. If the time lag is weekly, the \(Y_{t-1}\) will represent the value of Y of the last week. Such an AR models where the value of response variable of just one time lag is taken are called as **AR models of first model **or **AR (1) models**. Let’s understand with simple example of refrigerator sales. The ask is to forecast sales on a particular day in future. And, the data we have is historical sales data of every day for last 3 years. AR model of 1st order with time lag of 1 week will consider the refrigerator sale of last week.

Let’s understand the AR model concept with another example and the following diagram.

The above diagram represents the residential power demand across different months from 2003 to 2010. The ask will be to use the data to forecast the power demand in coming months. Taking time lag of 1 month, AR (1) model or AR model of 1st order will look like the following:

\(PowerDemand_t = \beta_0 + \beta_1*PowerDemandY_{t – 1} + error_t\)AR models have the parameter termed as **p**. The **parameter p **represents the previous values of **p** number of time lags when training the model. The AR model of 2nd order will have the value of response variable at any particular time depend upon the values of last two lags. Thus, **AR (2) model **will look like the following:

Generalizing the above for p, the **AR (p) model **will look like the following:

## Alternatives Methods to AR Models

Here are some of the alternative time-series forecasting methods to AR modeling technique:

- MA (Moving average)
- ARMA (Autoregressive moving average)
- ARIMA (Autoregressive integrated moving average)
- SARIMA (Seasonal autoregressive integrated moving average)
- VAR (Vector autoregression)
- VARMA (Vector autoregression moving average)
- SES (Simple exponential smoothing)

We will discuss the above time-series modeling technique in upcoming blog posts.

## Python Code Example for AR Model

We will use statsmodels.tsa package to load ar_model.AR class which is used to train univariate autoregressive (AR) model of order p. Note that `statsmodels.tsa`

contains model classes and functions that are useful for time series analysis. Basic models include univariate autoregressive models (AR), vector autoregressive models (VAR) and univariate autoregressive moving average models (ARMA). The following are some of the key steps which needed to be done for training the AR model:

- Plot the time-series
- Check the stationarity
- Determine the parameter p or order of the AR model
- Train the model

Here is the **Python code example **for AR model trained using statsmodels.tsa.ar_model.AutoReg class.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#
# Load AutoReg class from statsmodels.tsa.ar_model module
#
from statsmodels.tsa.ar_model import AutoReg
# Load and plot the time-series data
#
url='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
df = pd.read_csv(url,sep=",")
df['Consumption'].plot()
```

Here is how the time-series plot will look like.

Before going ahead and training the AR model, the following will needed to be found:

**Stationarity of the time-series data**: The stationarity of the data can be found using adfuller class of statsmodels.tsa.stattools module. The value of p-value is used to determine whether there is stationarity. If the value is less than 0.05, the stationarity exists.**Order of AR model to be trained**: The order of AR model is determined by checking the partial autocorrelation plot. The plot_pacf method of statsmodels.graphics.tsaplots is used to plot.

The following code is used to check the stationarity and order of the AR model.

```
#
# Check for stationarity of the time-series data
# We will look for p-value. In case, p-value is less than 0.05, the time series
# data can said to have stationarity
#
from statsmodels.tsa.stattools import adfuller
#
# Run the test
#
df_stationarityTest = adfuller(df['Consumption'], autolag='AIC')
#
# Check the value of p-value
#
print("P-value: ", df_stationarityTest[1])
#
# Next step is to find the order of AR model to be trained
# for this, we will plot partial autocorrelation plot to assess
# the direct effect of past data on future data
#
from statsmodels.graphics.tsaplots import plot_pacf
pacf = plot_pacf(df['Consumption'], lags=25)
```

The following plot can be used to determine the order of AR model. You may note that correlation value upto order 8 is high enough. Thus, we will train the AR model of order 8.

The next step is to **train the model**. Here is the code which can be used to train the model.

```
#
# Create training and test data
#
train_data = df['Consumption'][:len(df)-100]
test_data = df['Consumption'][len(df)-100:]
#
# Instantiate and fit the AR model with training data
#
ar_model = AutoReg(train_data, lags=8).fit()
#
# Print Summary
#
print(ar_model.summary())
```

Here is how the summary will look like:

Once the model is trained, the final step is to make the predictions and evaluate the predictions against the test data. This is the code which can be used to do the same.

```
#
# Make the predictions
#
pred = ar_model.predict(start=len(train_data), end=(len(df)-1), dynamic=False)
#
# Plot the prediction vs test data
#
from matplotlib import pyplot
pyplot.plot(pred)
pyplot.plot(test_data, color='red')
```

This is how the plot will look like:

## Learning References

Here are some good learning references for **auto-regressive models:**

- First Principles Understanding based on Physics - April 13, 2021
- Precision & Recall Explained using Covid-19 Example - April 11, 2021
- Moving Average Method for Time-series forecasting - April 4, 2021

## Leave a Reply