In this plot, you will quickly learn about how to find elbow point using SSE or Inertia plot with Python code and You may want to check out my blog on K-means clustering explained with Python example. The following topics get covered in this post:
Elbow method is one of the most popular method used to select the optimal number of clusters by fitting the model with a range of values for K in K-means algorithm. Elbow method requires drawing a line plot between SSE (Sum of Squared errors) vs number of clusters and finding the point representing the “elbow point” (the point after which the SSE or inertia starts decreasing in a linear fashion). Here is the sample elbow point. In the later sections, it is illustrated as to how to draw the line plot and find elbow point.
In order to find elbow point, you will need to draw SSE or inertia plot. In this section, you will see a custom Python function, drawSSEPlotForKMeans, which can be used to create the SSE (Sum of Squared Error) or Inertia plot representing SSE value on Y-axis and Number of clusters on X-axis. SSE is also called within-cluster SSE plot. Pay attention to some of the following function parameters which need to be passed to the method, drawSSEPlotForKMeans
def drawSSEPlot(df, column_indices, n_clusters=8, max_iter=300, tol=1e-04, init='k-means++', n_init=10, algorithm='auto'):
import matplotlib.pyplot as plt
inertia_values = []
for i in range(1, n_clusters+1):
km = KMeans(n_clusters=i, max_iter=max_iter, tol=tol, init=init, n_init=n_init, random_state=1, algorithm=algorithm)
km.fit_predict(df.iloc[:, column_indices])
inertia_values.append(km.inertia_)
fig, ax = plt.subplots(figsize=(8, 6))
plt.plot(range(1, n_clusters+1), inertia_values, color='red')
plt.xlabel('No. of Clusters', fontsize=15)
plt.ylabel('SSE / Inertia', fontsize=15)
plt.title('SSE / Inertia vs No. Of Clusters', fontsize=15)
plt.grid()
plt.show()
The following illustrates how the above function can be invoked to draw SSE or inertia plot. The Sklearn IRIS dataset is used for illustration purpose.
import pandas as pd
from sklearn.cluster import KMeans
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X)
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
df['target'] = y
Here is the code representing how SSE / inertia plot will be invoked:
drawSSEPlotForKMeans(df, [0, 1, 2, 3])
Here is how the inertia / SSE plot would look like:
The elbow point represents the point in the SSE / Inertia plot where SSE or inertia starts decreasing in a linear manner. In the fig 2, you may note that it is no. of clusters = 3 where the SSE starts decreasing in the linear manner.
Here is the summary of what you learned in this post related to finding elbow point using elbow method which includes drawing SSE / Inertia plot:
Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…
As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…
In today's data-driven business landscape, organizations are constantly seeking ways to harness the power of…
In this blog, you would get to know the essential mathematical topics you need to…
This blog represents a list of questions you can ask when thinking like a product…
AI agents are autonomous systems combining three core components: a reasoning engine (powered by LLM),…
View Comments
What a great article sir. this is what i was looking for.
Thank you Shubham
Hello
Thanks for the explanation, but i had a question.
what is [0, 1, 2, 3] referred to in the function?
drawSSEPlotForKMeans(df, [0, 1, 2, 3])