In this post, you will learn about some of the following in relation to scatterplot matrix. Note that scatter plot matrix can also be termed as pairplot. Later in this post, you would find Python code example in relation to using scatterplot matrix / pairplot (seaborn package).
Scatter plot matrix is a matrix (or grid) of scatter plots where each scatter plot in the grid is created between different combinations of variables. In other words, scatter plot matrix represents bi-variate or pairwise relationship between different combinations of variables while laying them in grid form. Here is a sample scatter plot matrix created using Sklearn Iris dataset.
Scatter plot matrix is also referred to as pair plot as it consists of scatter plots of different variables combined in pairs. In above matrix of scatter plots, pay attention to some of the following:
Here is another representation of pair plots comprising three different variables.
Scatterplot matrix can be used when you would like to assess some of the following:
One can analyse the pairwise relationship at several stages of machine learning model pipeline including some of the following:
In this section, the usage of seaborn package’s pairplot method is represented. By default, the pairplot function creates a grid of Axes such that each numeric variable in data is shared in the y-axis across a single row and in the x-axis across a single column. Here is the sample code representing pairplot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
#
# Load iris dataset
#
iris = datasets.load_iris()
#
# Create dataframe using IRIS dataset
#
df = pd.DataFrame(iris.data)
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
df['class'] = iris.target
#
# Create pairplot of all the variables with hue set to class
#
sns.pairplot(df, hue='class')
plt.show()
Pay attention to the usage of hue parameter which is passed categorical variable and used to map plot aspects to different colors. It is also possible to show a subset of variables or plot different variables on the rows and columns. Usage of vars parameter helps plot only a subset of variables as shown in the code below. The plots in fig1 and fig 2 represents usage of subset of variables for pairplot.
sns.pairplot(df, hue='class', vars=['sepal_length', 'sepal_width', 'petal_length'])
plt.show()
Here are some learning from this post:
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…