In this post, you will learn about **gradient descent algorithm** with simple examples. It is attempted to make the explanation in **layman terms**. For a data scientist, it is of utmost importance to get a good grasp on the concepts of gradient descent algorithm as it is widely used for optimising the **objective function** / **loss function** related to various machine learning algorithms such as regression, neural network etc in order to learn weights / parameters. The related topics such as the following are covered in this post:

- Introduction to Gradient Descent algorithm
- Different types of gradient descent
- List of top 5 Youtube videos on Gradient descent algorithm

Table of Contents

## Introduction to Gradient Descent Algorithm

Gradient descent algorithm is an **optimization algorithm **which is used to minimise the function. The function which is set to be minimised is called as an **objective function**. For **machine learning**, the **objective function** is also termed as the **cost function** or **loss function**. It is the loss function which is optimized (minimised) and gradient descent is used to find the most optimal value of parameters / weights which minimises the loss function. **Loss function, simply speaking, is the measure of the squared difference between actual values and predictions. ** In order to minimise the objective function, the most optimal value of **the parameters of the function** from large or infinite parameter space are found. Before getting into the example of gradient descent example, let’s understand in detail about what is **gradient descent** and **how to use gradient descent?**

### What is Gradient Descent?

**Gradient of a function at any point is the direction of steepest increase or ascent of the function at that point****. **For illustration purpose, look at the following diagram and identify the point that represents gradient. Is it not the direction of arrow A that represents the gradient?

Based on above, **the gradient descent of a function at any point, thus, represent the direction of steepest decrease or descent of function at that point**.

### How to calculate Gradient Descent?

In order to **find the gradient** of the function with respect to x dimension, take the **derivative of the** **function** with respect to x , then substitute the x-coordinate of the point of interest in for the x values in the derivative. Once gradient of the function at any point is calculated, the gradient descent can be calculated by multiplying the gradient with -1. Here are the steps of finding minimum of the function using gradient descent:

- Calculate the gradient by taking the derivative of the function with respect to the specific parameter. In case, there are multiple parameters, take the partial derivatives with respect to different parameters.
- Calculate the descent value for different parameters by multiplying the value of derivatives with learning or descent rate (step size) and -1.
- Update the value of parameter by adding up the existing value of parameter and the descent value. The diagram below represents the updation of parameter \(\theta\) with the value of gradient in the opposite direction while taking small steps.

- In case of multiple parameters, the value of different parameters would need to be updated as given below if the cost function is \(\frac{1}{2N}\sum (y_i – (\theta_0 + \theta_1x )^2)\) if the regression function is \(y = \theta_0 + \theta_1x\)

- The parameters will need to be updated until function minimises or converges. The diagram below represents the same aspect.

## Different Types of Gradient Descent Algorithms

Gradient descent algorithms could be implemented in the following two different ways:

**Batch gradient descent**: When the weight update is calculated based on all examples in the training dataset, it is called as batch gradient descent.**Stochastic gradient descent**: When the weight update is calculated incrementally after each training example or a small group of training example, it is called as stochastic gradient descent.

The details in relation to difference between batch and stochastic gradient descent will be provided in future post.

## Top 5 Youtube Videos on Gradient Descent Algorithm

Here is the list of top 5 Youtube Videos that could be viewed to get a good understanding of Gradient descent algorithm.

**Gradient descent, how neural networks learn**

2. **Gradient Descent, Step-by-Step**

3. **Khan Academy – Why the gradient is the direction of steepest ascent?**

4. **Gradient Descent Tutorial by Andrew Ng**

5. **Gradient Descent Algorithm by Prof. S. Sengupta IIT Kharagpur**

## Conclusions

As a summary, you learned the concepts of **Gradient Descent **along with some of the following aspects:

- Gradient descent algorithm is an optimization algorithm which is used to minimise the objective function.
- In case of machine learning, the objective function that needs to be minimised is termed as cost function or loss function.
- Gradient descent is used to minimise the loss function or cost function in machine learning algorithm such as linear regression, neural network etc.
- Gradient descent represents the opposite direction of gradient. Gradient of a function at any point represents direction of steepest ascent of the function at that point.
- Batch gradient descent is updating the weights after all the training examples are processed. Stochastic gradient descent is about updating the weights based on each training data or a small group of training data.
- Gradient of a function at any point can be calculated as the first-order derivative of that function at that point.
- In case of multiple dimension, gradient of function at any point can be calculated as the partial derivative of the function a that point against different dimensions.

- What are Actionable Insights: Examples & Concepts - October 17, 2021
- How to Create Data-Driven Culture: Key Steps - October 15, 2021
- Overfitting & Underfitting Concepts & Interview Questions - October 14, 2021

[…] Optimization methods: Optimization method or technique is applied on the loss function in order to learn the parameters of the hypothesis or model. One of the most popular optimisation method or technique is gradient descent method. You may want to check one of my related post such as Gradient Descent explained with examples. […]