In this plot, you will quickly learn about how to find elbow point using SSE or Inertia plot with Python code and You may want to check out my blog on K-means clustering explained with Python example. The following topics get covered in this post:
Elbow method is one of the most popular method used to select the optimal number of clusters by fitting the model with a range of values for K in K-means algorithm. Elbow method requires drawing a line plot between SSE (Sum of Squared errors) vs number of clusters and finding the point representing the “elbow point” (the point after which the SSE or inertia starts decreasing in a linear fashion). Here is the sample elbow point. In the later sections, it is illustrated as to how to draw the line plot and find elbow point.
In order to find elbow point, you will need to draw SSE or inertia plot. In this section, you will see a custom Python function, drawSSEPlotForKMeans, which can be used to create the SSE (Sum of Squared Error) or Inertia plot representing SSE value on Y-axis and Number of clusters on X-axis. SSE is also called within-cluster SSE plot. Pay attention to some of the following function parameters which need to be passed to the method, drawSSEPlotForKMeans
def drawSSEPlot(df, column_indices, n_clusters=8, max_iter=300, tol=1e-04, init='k-means++', n_init=10, algorithm='auto'):
import matplotlib.pyplot as plt
inertia_values = []
for i in range(1, n_clusters+1):
km = KMeans(n_clusters=i, max_iter=max_iter, tol=tol, init=init, n_init=n_init, random_state=1, algorithm=algorithm)
km.fit_predict(df.iloc[:, column_indices])
inertia_values.append(km.inertia_)
fig, ax = plt.subplots(figsize=(8, 6))
plt.plot(range(1, n_clusters+1), inertia_values, color='red')
plt.xlabel('No. of Clusters', fontsize=15)
plt.ylabel('SSE / Inertia', fontsize=15)
plt.title('SSE / Inertia vs No. Of Clusters', fontsize=15)
plt.grid()
plt.show()
The following illustrates how the above function can be invoked to draw SSE or inertia plot. The Sklearn IRIS dataset is used for illustration purpose.
import pandas as pd
from sklearn.cluster import KMeans
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X)
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
df['target'] = y
Here is the code representing how SSE / inertia plot will be invoked:
drawSSEPlotForKMeans(df, [0, 1, 2, 3])
Here is how the inertia / SSE plot would look like:
The elbow point represents the point in the SSE / Inertia plot where SSE or inertia starts decreasing in a linear manner. In the fig 2, you may note that it is no. of clusters = 3 where the SSE starts decreasing in the linear manner.
Here is the summary of what you learned in this post related to finding elbow point using elbow method which includes drawing SSE / Inertia plot:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…
View Comments
What a great article sir. this is what i was looking for.
Thank you Shubham
Hello
Thanks for the explanation, but i had a question.
what is [0, 1, 2, 3] referred to in the function?
drawSSEPlotForKMeans(df, [0, 1, 2, 3])