In this plot, you will quickly learn about how to find elbow point using SSE or Inertia plot with Python code and You may want to check out my blog on K-means clustering explained with Python example. The following topics get covered in this post:
Elbow method is one of the most popular method used to select the optimal number of clusters by fitting the model with a range of values for K in K-means algorithm. Elbow method requires drawing a line plot between SSE (Sum of Squared errors) vs number of clusters and finding the point representing the “elbow point” (the point after which the SSE or inertia starts decreasing in a linear fashion). Here is the sample elbow point. In the later sections, it is illustrated as to how to draw the line plot and find elbow point.
In order to find elbow point, you will need to draw SSE or inertia plot. In this section, you will see a custom Python function, drawSSEPlotForKMeans, which can be used to create the SSE (Sum of Squared Error) or Inertia plot representing SSE value on Y-axis and Number of clusters on X-axis. SSE is also called within-cluster SSE plot. Pay attention to some of the following function parameters which need to be passed to the method, drawSSEPlotForKMeans
def drawSSEPlot(df, column_indices, n_clusters=8, max_iter=300, tol=1e-04, init='k-means++', n_init=10, algorithm='auto'):
import matplotlib.pyplot as plt
inertia_values = []
for i in range(1, n_clusters+1):
km = KMeans(n_clusters=i, max_iter=max_iter, tol=tol, init=init, n_init=n_init, random_state=1, algorithm=algorithm)
km.fit_predict(df.iloc[:, column_indices])
inertia_values.append(km.inertia_)
fig, ax = plt.subplots(figsize=(8, 6))
plt.plot(range(1, n_clusters+1), inertia_values, color='red')
plt.xlabel('No. of Clusters', fontsize=15)
plt.ylabel('SSE / Inertia', fontsize=15)
plt.title('SSE / Inertia vs No. Of Clusters', fontsize=15)
plt.grid()
plt.show()
The following illustrates how the above function can be invoked to draw SSE or inertia plot. The Sklearn IRIS dataset is used for illustration purpose.
import pandas as pd
from sklearn.cluster import KMeans
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X)
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
df['target'] = y
Here is the code representing how SSE / inertia plot will be invoked:
drawSSEPlotForKMeans(df, [0, 1, 2, 3])
Here is how the inertia / SSE plot would look like:
The elbow point represents the point in the SSE / Inertia plot where SSE or inertia starts decreasing in a linear manner. In the fig 2, you may note that it is no. of clusters = 3 where the SSE starts decreasing in the linear manner.
Here is the summary of what you learned in this post related to finding elbow point using elbow method which includes drawing SSE / Inertia plot:
Last updated: 7th May, 2024 Linear regression is a popular statistical method used to model…
Last updated: 3rd May, 2024 Have you ever wondered why some machine learning models perform…
Last updated: 2nd May, 2024 The success of machine learning models often depends on the…
When working on a machine learning project, one of the key challenges faced by data…
Last updated: 1st May, 2024 The bias-variance trade-off is a fundamental concept in machine learning…
Last updated: 1st May, 2024 As a data scientist, understanding the nuances of various cost…
View Comments
What a great article sir. this is what i was looking for.
Thank you Shubham
Hello
Thanks for the explanation, but i had a question.
what is [0, 1, 2, 3] referred to in the function?
drawSSEPlotForKMeans(df, [0, 1, 2, 3])