Python – Creating Scatter Plot with IRIS Dataset

In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we have plotted and explored how how Petal Length and Sepal Length are related across different kinds of flowers. To create a scatter plot in Python, we have used the Matplotlib library in this post. Let’s get started and learn!

What is IRIS dataset?

IRIS is a multivariate dataset introduced by Ronald Fisher in his 1936 paper, the use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson’s IRIS data set because Edgar Anderson gathered the data to evaluate / quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspe Peninsula “all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus”.

The IRIS dataset is a collection of 150 records of Iris flowers. Each record includes four attributes / features: the petal length and width, and the sepal length and width. The goal of this dataset is to predict the type of Iris flower based on the given features. There are three types of Iris flowers in the dataset represented by 50 records each: Iris setosa, Iris virginica, and Iris versicolor. The IRIS dataset is a popular choice for machine learning because it is small and easy to work with, but still provides enough data to produce meaningful results.

The following Python code can be used to see the details of IRIS dataset.

from sklearn.datasets import load_iris
print(iris.DESCR)


Creating Scatter Plot with IRIS dataset

IRIS is perhaps the best known database to be found in machine learning literature. Fisher’s paper was published in 1936, one year before most people in America had heard about a new computing device called a Turing machine. Nevertheless, IRIS has remained a popular test case for many statistical classification techniques, especially methods such as support vector machines. This is largely because it is very easy to visualize what is happening in a 2-dimensional or even 3-dimensional space. With just 4 features, you can easily plot each data point on a graph and get a feel for which classifications will be easy and which will be difficult.

A Scatter plot is a graph in which the data points are plotted on a coordinate grid and the pattern of the resulting points reveals important information about the data set. The data points may be randomly distributed, or they may form a distinct pattern. Scatter plots are useful for identifying trends, relationships, and outliers in data sets. They can also be used to compare two or more data sets. Scatter plots are typically used with large data sets, as the patterns that emerge can be difficult to see with smaller data sets. The following code can be used to create the scatter plot using IRIS dataset

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])

# select setosa and versicolor
y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', 0, 1)

# extract sepal length and petal length
X = df.iloc[0:100, [0, 2]].values

# plot data
plt.scatter(X[:50, 0], X[:50, 1],
color='blue', marker='o', label='Setosa')
plt.scatter(X[50:100, 0], X[50:100, 1],
color='green', marker='s', label='Versicolor')

plt.xlabel('Sepal length [cm]')
plt.ylabel('Petal length [cm]')
plt.legend(loc='upper left')

# plt.savefig('images/02_06.png', dpi=300)
plt.show()


The following represents the scatter plot which gets created by executing the above Python code:

import pandas as pd
#
# IRIS data
#
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
#
# Create a dataframe
#
#