Regression analysis is a fundamental statistical technique used in many fields, from finance to social sciences. It involves modeling the relationship between a dependent variable and one or more independent variables. The **Ordinary Least Squares (OLS)** method is one of the most commonly used techniques for regression analysis.

**Ordinary least squares (OLS)** is a linear regression technique used to find the best-fitting line for a set of data points by minimizing the residuals (the differences between the observed and predicted values). It does so by estimating the coefficients of a linear regression model by minimizing the sum of the squared differences between the observed values of the dependent variable and the predicted values from the model. It is a popular method because it is easy to use and produces decent results. In this blog post, we will discuss the basics of OLS and provide some examples to help you understand how it works.

In this blog post, we will discuss the concepts and applications of the OLS method. We will explore how it works, its assumptions, and how to interpret its coefficients. We will also provide examples of how OLS can be used in different scenarios, from simple linear regression to more complex models. As data scientists, it is very important to learn the concepts of OLS before using it in the regression model.

## What’s Ordinary Least Squares (OLS) Method?

The ordinary least squares (OLS) method can be defined as a linear regression technique that is used to estimate the unknown parameters in a model. The method relies on minimizing the sum of squared residuals between the actual (observed values of the dependent variable) and predicted values from the model. The residual can be defined as the difference between the actual value and the predicted value. Another word for residual can be error. The sum of the squared differences is also known as the residual sum of squares (RSS). The OLS method minimizes the RSS by finding the values of the coefficients that result in the smallest possible RSS. The resulting line is called the regression line, which represents the best fit for the data.

In mathematical terms, this can be written as:

**Minimize **∑(**yi** – **ŷ**i)**^2**

where **y**i is the actual value, **ŷ**i is the predicted value. A linear regression model used for determining the value of the response variable, **ŷ,** can be represented as the following equation.

**y = b0 + b1x1 + b2x2 + … + bnxn + e**

where:

- y is the dependent variable
- b0 is the intercept
- b1, b2, …, bn are the coefficients of the independent variables x1, x2, …, xn
- e is the error term

The coefficients b1, b2, …, bn can also be called the **coefficients of determination**. The goal of the OLS method can be used to estimate the unknown parameters (**b1, b2, …, bn**) by minimizing the sum of squared residuals (RSS). The sum of squared residuals is also termed the sum of squared error (SSE).

This method is also known as the **least-squares method** for regression or linear regression.

### Assumptions of OLS

The OLS method relies on several assumptions to be valid. The following is the list of key assumptions:

**Linearity**: There must be linear relationship between the dependent variable and the independent variables.**Independence**: The observations must be independent of each other.**Homoscedasticity**: The variance of the residuals should be constant across all levels of the independent variables.**Normality**: The residuals / errors should be normally distributed.**No multicollinearity**: The independent variables should not be highly correlated with each other.

## Minimizing the sum of squares residuals using the calculus method

The following represents the calculus method for minimizing the sum of squares residuals to find the unknown parameters for the model y = mx + b

Take the partial derivative of the cost function, sum of squared residuals, **∑(yi – ŷi)^2 **with respect to m:

**∂/∂m (SSE) = ∑-2Xi(yi – ŷi)**

Take the partial derivative of the cost function,** ∑ (yi – ŷi)^2** with respect to b:

**∂/∂b (SSE) = ∑-2(yi – ŷi)**

Set the partial derivatives equal to zero and solve for m and b:

**∑-2Xi(yi – ŷi) = 0**

**∑-(yi – ŷi) = 0**

This results in the following two equations:

**∑yi*xi = m∑xi*xi + b*∑xi**

**∑yi = m∑xi + b*n**

where n is the number of data points. These two equations can be solved simultaneously to find the values for m and b. Let’s say that the following three points are available such as (3, 7), (4, 9), (5, 12). And, the ask is to find the best fit line.

We will apply the calculus technique and use the above formulas. We will use the following formula:

**∑-2Xi(yi – ŷi) = 0**

The following calculation will happen:

-2[3(7 – (3m + b)) + 4(9 – (4m + b)) + 5(12 – (5m + b))] = 0

=> 3*7 + 4*9 + 5*12 – (9m + 3b + 16m + 4b + 25m + 5b) = 0

=> 21 + 36 + 60 – (50m + 12b) = 0

=> **116 = 50m + 12b …. eq (1)**

Let’s use another formula to find another equation:

**∑-(yi – ŷi) = 0**

The following calculation will happen:

7 – (3m + b) + 9 – (4m + b) + 12 – (5m + b) = 0

=> **28 = 12m + 3b … eq(2)**

The above two equations can be solved and the values of m and b can be found.

## Evaluating OLS Results

OLS provides us with estimates of the coefficients of a linear regression model, but it’s important to evaluate how well the model fits the data. In this section, we will discuss different methods for evaluating OLS results such as some of the following:

- Residual analysis
- R-squared and adjusted R-squared
- F-statistics

### Residual Analysis

Residual analysis involves examining the residuals (the differences between the observed values of the dependent variable and the predicted values from the model) to assess how well the model fits the data. Ideally, the residuals should be randomly scattered around zero and have constant variance.

If the residuals exhibit a pattern (such as a U-shape or a curve), it suggests that the model may not be capturing all of the relevant information. In this case, we may need to consider adding additional variables or transforming the data.

### R-squared and Adjusted R-squared

R-squared is a measure of how much of the variation in the dependent variable is explained by the independent variables in the model. It ranges from 0 to 1, with higher values indicating a better fit.

Adjusted R-squared is similar to R-squared, but it takes into account the number of independent variables in the model. It is a more conservative estimate of the model’s fit, as it penalizes the addition of variables that do not improve the model’s performance.

### F-Statistic

The F-statistic tests the overall significance of the model by comparing the variation in the dependent variable explained by the model to the variation not explained by the model. A large F-statistic indicates that the model as a whole is significant.

## Alternative Method to OLS

While OLS is a popular method for estimating linear regression models, there are several alternative methods that can be used depending on the specific requirements of the analysis. Let’s discuss some of the popular alternative methods to OLS.

- Ridge regression
- Lasso regression
- Elastic net regression

### Ridge Regression

Ridge regression is a method that adds a penalty term to the OLS cost function to prevent overfitting in scenarios where there are many independent variables or the independent variables are highly correlated. The penalty term, known as the shrinkage parameter, reduces the magnitude of the coefficients and can help prevent the model from being too complex.

### Lasso Regression

Lasso regression is similar to ridge regression, but it adds a penalty term that can result in some of the coefficients being set to zero. This can help simplify the model and reduce the risk of overfitting.

### Elastic Net Regression

Elastic net regression is a combination of ridge and lasso regression that adds both a L1 and L2 penalty term to the OLS cost function. This method can help balance the advantages of both methods and can be particularly useful when there are many independent variables with varying degrees of importance.

## Summary

The ordinary least squares (OLS) method is a linear regression technique that is used to estimate the unknown parameters in a model. The method relies on minimizing the sum of squared residuals between the actual and predicted values. The OLS method can be used to find the best-fit line for data by minimizing the sum of squared errors or residuals between the actual and predicted values. And, the calculus method for minimizing the sum of squares residuals is take the partial derivative of the cost function with respect to the coefficients of determination, set the partial derivatives equal to zero and solve for each of the coefficients. The OLS method is also known as least squares method for regression or linear regression.

- Linear Regression Cost Function: Python Example - December 2, 2023
- KNN vs Logistic Regression: Differences, Examples - December 2, 2023
- Linear Regression vs Logistic Regression: Differences - December 1, 2023

Why sum of squared residuals are taken?