Linear regression is a foundational algorithm in machine learning and statistics, used for predicting numerical values based on input data. Understanding the cost function in linear regression is crucial for grasping how these models are trained and optimized. In this blog, we will understand different aspects of cost function used in linear regression including how it does help in building a regression model having high performance.
In linear regression, the cost function quantifies the error between predicted values and actual data points. It is a measure of how far off a linear model’s predictions are from the actual values. The most commonly used cost function in linear regression is the Mean Squared Error (MSE) function. The MSE function is defined as:
$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2$
Where:
Let’s look into an example of how do we calculate cost function value in linear regression using Python. In the following code, we have a dataset and a simple linear model with pre-determined value of coefficient (theta1) and bias (theta0). The cost function is calculated using the code mse = np.mean((Y – Y_pred) ** 2). This function computes the average of the squared differences between the actual values and the predicted values, giving a single numerical value (the MSE) that represents the average error of the model. This is the Mean Squared Error, a common metric used to evaluate the performance of a regression model.
import numpy as np # Sample data X = np.array([1, 2, 3, 4, 5]) # Input features Y = np.array([2, 4, 5, 4, 5]) # Actual output # Assuming a simple linear model with weights theta0 = 0.5 theta1 = 0.9 # Predicted values Y_pred = theta0 + theta1 * X # Calculating MSE mse = np.mean((Y - Y_pred) ** 2) print("MSE:", mse)
Determining the values of theta1 ($theta_1$) and theta0 ($theta_0$) in linear regression is a key part of the model training process. These parameters represent the intercept and slope of the line that best fits the data, respectively. The process of finding these values is known as linear regression model fitting, and it aims to minimize the cost function, typically the Mean Squared Error (MSE).
Here is the most common and popular method to determine the optimal parameter values of linear regression models by minimizing the cost function:
Gradient Descent is the most common method used for finding the optimal values of $theta_0$ and $theta_1$. In case of multiple linear regression model, it would be $theta_0$, $theta_1$, $theta_2$, …, $theta_n$. It’s an iterative optimization algorithm used to minimize the cost function. During the training process, Gradient Descent uses the cost function to determine how far off the model’s predictions are from the actual values and which direction the parameters should be adjusted to reduce this error. The process continues iteratively, adjusting the model parameters slightly in each step to minimize the cost function until it converges to a minimum value or until a certain number of iterations are completed.
Here’s a simplified overview of how it works for simple linear regression model:
Determining a “good” value for the cost function in linear regression, such as Mean Squared Error (MSE), is not straightforward because it greatly depends on the context of the data and the specific problem being addressed. However, there are some guidelines and considerations that can help in assessing whether the cost function value is acceptable:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…