Hierarchical clustering a type of unsupervised machine learning algorithm that stands out for its unique approach to grouping data points. Unlike its counterparts, such as k-means, it doesn’t require the predetermined number of clusters. This feature alone makes it an invaluable method for exploratory data analysis, where the true nature of data is often hidden and waiting to be discovered. But the capabilities of hierarchical clustering go far beyond just flexibility. It builds a tree-like structure, a dendrogram, offering insights into the data’s relationships and similarities, which is more than just clustering—it’s about understanding the story your data wants to tell.
In this blog, we’ll explore the key features that set hierarchical clustering apart, delve into its types (agglomerative and divisive), and discuss its wide array of application. You will learn about the concepts of Hierarchical clustering with the help of Python code example. As data scientist / machine learning enthusiasts, you would want to learn the concepts of hierarchical clustering in a great manner.
Hierarchical clustering is Hierarchical clustering is a type of unsupervised machine learning algorithm used to group similar objects into clusters. Unlike many other clustering methods, such as k-means, hierarchical clustering does not require you to specify the number of clusters beforehand. Instead, it builds a hierarchy of clusters either by successively merging smaller clusters into larger ones (agglomerative approach) or by splitting larger clusters into smaller ones (divisive approach).
The following are some of the unique features of hierarchical clustering:
The hierarchical clustering can be classified into the following two different type of clustering:
In agglomerative clustering, the cluster formation starts with individual points. Each point is considered as one cluster. Let’s say there are N data points. In the beginning, there will be N clusters. Then, the distance between each pair of cluster is found and the clusters closest to each other is matched and made as one cluster. This would result in (N – 1) cluster. In the next step, the distance between pair of clusters are found and the clusters closest to each other is matched and made as one cluster. This would result in (N – 2) clusters. The same process is repeated until all the data points are merged into one cluster. e.g., root cluster.
It is also called as bottom-up hierarchical clustering as the clustering process starts with individual data point and move further up to form one cluster – root cluster. In the diagram below, note as to how the clusters have been formed starting from the leaf node and moving upward.
In the above diagram, on the right hand side of the picture is what is called as Dendogram. In the beginning, all of the members (letter A – G) are in the leaf node.
The node of the Dendogram represents the subset of points. Cutting the Dendogram at different levels will give different number of clusters. Thus, if one cut the Dendogram after first level, one will get the four clusters (J/K, I/H, B/C and E/F). However, if one cuts the Dendogram one level up, one will get 5 clusters such as J/K, I/H, A & B/C, D, E/F. Let’s understand the formation of cluster by slicing the Dendogram at any specific level using the following diagram.
In the above Dendogram diagram, slicing vertically with red line results in creation of four clusters using different color codes.
The agglomerative hierarchical clustering algorithm differs based on the distance method used to create clusters. The following are common distance methods used to create clusters:
The clustering method makes use of one of the above distance calculation methods and a distance matrix such as the following to determine the cluster. Note how the distance between point D & F is smallest and thus, D & F can be made as one cluster.
In divisive hierarchical clustering, the cluster formation starts with all the points being formed as one cluster. Applying K-means clustering in recursive manner can result in multiple clusters formation in divisive manner resulting in set of clusters with one individual points. The following represents the divisive hierarchical clustering algorithm:
Lets understand the algorithm using the diagram shown below:
In the diagram shown above, the clustering starts with the root cluster having points A, B, C, D, E. This results in two clusters such as A and BCDEF. Applying clustering on BCDEF results in another two clusters such as BC and DEF. This process continues until there are clusters with individual points.
Here is the Python Sklearn code which demonstrates Agglomerative clustering. Pay attention to some of the following which plots the Dendogram. Dendogram is used to decide on number of clusters based on distance of horizontal line (distance) at each level. The number of clusters chosen is 2.
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.decomposition import PCA
import scipy.cluster.hierarchy as hc
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
#
# Load the CSV file
#
df = pd.read_csv("/Users/apple/Downloads/cc_general.csv")
#
# Drop the customer id column
#
df = df.drop('CUST_ID', axis = 1)
#
# Fill the missing values with ffill method
#
df.fillna(method ='ffill', inplace = True)
#
# Scale the data and normalize
#
sc = StandardScaler()
df_scaled = sc.fit_transform(df)
df_normalized = normalize(df_scaled)
#
# Reduce the dimensionality of data to 3 features
#
pca = PCA(n_components=3)
df_pca = pca.fit_transform(df_normalized)
df_pca = pd.DataFrame(df_pca)
df_pca.columns = ['P1', 'P2', 'P3']
#
# Create the Dendogram plot
#
plt.figure(figsize =(8, 8))
plt.title('Visualising the data')
dendrogram = hc.dendrogram((hc.linkage(df_pca, method ='ward')))
The following Dendogram plot is created.
Based on the above dendogram, lets select different number of clusters and create plot based on slicing the dendogram at different levels. The picture below represents the slicing of Dendogram at four different levels and coming up different number of clusters.
At level 1, note that there will be 5 clusters. Here is the code and related plot.
#
# Create the clusters using Agglomerative hierarchical clustering
#
agc = AgglomerativeClustering(n_clusters = 5)
plt.figure(figsize =(8, 8))
plt.scatter(df_pca['P1'], df_pca['P2'], c = agc.fit_predict(df_pca), cmap ='rainbow')
plt.title("Agglomerative Hierarchical Clusters - Scatter Plot", fontsize=18)
plt.show()
At level 2, note that there will be 4 clusters. Here is the code and related plot.
#
# Create the clusters using Agglomerative hierarchical clustering
#
agc = AgglomerativeClustering(n_clusters = 4)
plt.figure(figsize =(8, 8))
plt.scatter(df_pca['P1'], df_pca['P2'], c = agc.fit_predict(df_pca), cmap ='rainbow')
plt.title("Agglomerative Hierarchical Clusters - Scatter Plot", fontsize=18)
plt.show()
At level 3, note that there will be 3 clusters. Here is the code and related plot.
#
# Create the clusters using Agglomerative hierarchical clustering
#
agc = AgglomerativeClustering(n_clusters = 3)
plt.figure(figsize =(8, 8))
plt.scatter(df_pca['P1'], df_pca['P2'], c = agc.fit_predict(df_pca), cmap ='rainbow')
plt.title("Agglomerative Hierarchical Clusters - Scatter Plot", fontsize=18)
plt.show()
At level 4, note that there will be 2 clusters. Here is the code and related plot.
#
# Create the clusters using Agglomerative hierarchical clustering
#
agc = AgglomerativeClustering(n_clusters = 3)
plt.figure(figsize =(8, 8))
plt.scatter(df_pca['P1'], df_pca['P2'], c = agc.fit_predict(df_pca), cmap ='rainbow')
plt.title("Agglomerative Hierarchical Clusters - Scatter Plot", fontsize=18)
plt.show()
Here is the summary of what you learned in this post related to Hierarchical clustering:
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…