Gradient Descent Explained Simply with Examples

0

In this post, you will learn about gradient descent algorithm with simple examples. It is attempted to make the explanation in layman terms. For a data scientist, it is of utmost importance to get a good grasp on the concepts of  gradient descent algorithm as it is widely used for optimising the objective function / loss function related to various machine learning algorithms such as  regression, neural network etc in order to learn weights / parameters. The related topics such as the following are covered in this post:

  • Introduction to Gradient Descent algorithm
  • Different types of gradient descent
  • List of top 5 Youtube videos on Gradient descent algorithm

Introduction to Gradient Descent Algorithm

Gradient descent algorithm is an optimization algorithm which is used to minimise the function. The function which is set to be minimised is called as an objective function. For machine learning, the objective function is also termed as the cost function or loss function. It is the loss function which is optimized (minimised) and gradient descent is used to find the most optimal value of parameters / weights which minimises the loss function. Loss function, simply speaking, is the measure of the squared difference between actual values and predictions. In order to minimise the objective function, the most optimal value of the parameters of the function from large or infinite parameter space are found. Before getting into the example of gradient descent example, let’s understand in detail about what is gradient descent and how to use gradient descent?

What is Gradient Descent?

Gradient of a function at any point is the direction of steepest increase or ascent of the function at that point. For illustration purpose, look at the following diagram and identify the point that represents gradient. Is it not the direction of arrow A that represents the gradient?

Fig 1. Gradient – Steepest Ascent (Arrow A)

Based on above, the gradient descent of a function at any point, thus, represent the direction of steepest decrease or descent of function at that point.

How to calculate Gradient Descent?

In order to find the gradient of the function with respect to x dimension, take the derivative of the function with respect to x , then substitute the x-coordinate of the point of interest in for the x values in the derivative. Once gradient of the function at any point is calculated, the gradient descent can be calculated by multiplying the gradient with -1. Here are the steps of finding minimum of the function using gradient descent:

  • Calculate the gradient by taking the derivative of the function with respect to the specific parameter. In case, there are multiple parameters, take the partial derivatives with respect to different parameters.
  • Calculate the descent value for different parameters by multiplying the value of derivatives with learning or descent rate (step size) and -1.
  • Update the value of parameter by adding up the existing value of parameter and the descent value. The diagram below represents the updation of parameter \(\theta\) with the value of gradient in the opposite direction while taking small steps.
Update the parameter value with gradient descent value
Fig 2. Update the parameter value with gradient descent value at each point
  • In case of multiple parameters, the value of different parameters would need to be updated as given below if the cost function is \(\frac{1}{2N}\sum (y_i – (\theta_0 + \theta_1x )^2)\) if the regression function is \(y = \theta_0 + \theta_1x\)
Find derivative or gradient of the function wrt different parameters at point x
Fig 3. Find derivative or gradient of the function wrt different parameters at point x
  • The parameters will need to be updated until function minimises or converges. The diagram below represents the same aspect.
Update the parameter value with gradient descent value
Fig 4. Update the parameter value with gradient descent value

Different Types of Gradient Descent Algorithms

Gradient descent algorithms could be implemented in the following two different ways:

  • Batch gradient descent: When the weight update is calculated based on all examples in the training dataset, it is called as batch gradient descent.
  • Stochastic gradient descent: When the weight update is calculated incrementally after each training example or a small group of training example, it is called as stochastic gradient descent.

The details in relation to difference between batch and stochastic gradient descent will be provided in future post.

Top 5 Youtube Videos on Gradient Descent Algorithm

Here is the list of top 5 Youtube Videos that could be viewed to get a good understanding of Gradient descent algorithm.

  1. Gradient descent, how neural networks learn

2. Gradient Descent, Step-by-Step

Gradient Descent, Step-By-Step by Josh Starmer

3. Khan Academy – Why the gradient is the direction of steepest ascent?

Gradient Descent Tutorial by Khan Academy

4. Gradient Descent Tutorial by Andrew Ng

Gradient Descent by Andrew Ng

5. Gradient Descent Algorithm by Prof. S. Sengupta IIT Kharagpur

Gradient Descent Algorithm by Prof. S. Sengupta IIT Kharagpur

Conclusions

As a summary, you learned the concepts of Gradient Descent along with some of the following aspects:

  • Gradient descent algorithm is an optimization algorithm which is used to minimise the objective function.
  • In case of machine learning, the objective function that needs to be minimised is termed as cost function or loss function.
  • Gradient descent is used to minimise the loss function or cost function in machine learning algorithm such as linear regression, neural network etc.
  • Gradient descent represents the opposite direction of gradient. Gradient of a function at any point represents direction of steepest ascent of the function at that point.
  • Batch gradient descent is updating the weights after all the training examples are processed. Stochastic gradient descent is about updating the weights based on each training data or a small group of training data.
  • Gradient of a function at any point can be calculated as the first-order derivative of that function at that point.
  • In case of multiple dimension, gradient of function at any point can be calculated as the partial derivative of the function a that point against different dimensions.
Ajitesh Kumar
Follow me
Share.

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.