# PCA vs LDA Differences, Plots, Examples

Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most popular dimensionality reduction techniques. Both methods are used to reduce the number of features in a dataset while retaining as much information as possible. But how do they differ, and when should you use one method over the other? As data scientists, it is important to get a good understanding around this concept as it is used in building machine learning models. Keep reading to find out with the help of Python code & examples.

## How does PCA work?

Principal Component Analysis (PCA) works by identifying the directions (components) that maximize the variance in a dataset. In other words, it seeks to find the linear combination of features that captures as much variance as possible. The first component is the one that captures the maximum variance, the second component is orthogonal to the first and captures the remaining variance, and so on. PCA is a useful technique for dimensionality reduction when your data has linear relationships between features – that is, when you can express one feature as a function of the other(s). In such cases, you can use PCA to compress your data while retaining most of the information content by choosing just the right number of features (components).

Here’s an example to help illustrate how PCA works. We will use IRIS dataset. In the code below, the IRIS dataset is transformed into 2 components and scatter plot is created representing all the three classes such as Setosa, Versicolour and Virginica.

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

from sklearn import datasets
#
#
#
# Create a dataframe from IRIS dataset
#
df = pd.DataFrame(iris.data, columns=["sepal_length", "sepal_width", "petal_length", "petal_width"])
df["class"] = iris.target
#
# Create PCA transformed dataset with dimensionality
# reduced to 2; n_components = 2
#
pca = PCA(n_components=2)
X_pca = pca.fit(df.iloc[:, 0:4]).transform(df.iloc[:, 0:4])
#
# Create plot from transformed dataset
#
plt.figure(figsize=(8,6))

plt.scatter(X_pca[0:50,0], X_pca[0:50,1], color='green', marker='o', label='Setosa')
plt.scatter(X_pca[50:100,0], X_pca[50:100,1], color='blue', marker='s', label='Versicolour')
plt.scatter(X_pca[100:150,0], X_pca[100:150,1], color='red', marker='+', label='Virginica')

plt.title("PCA components plot for IRIS Dataset", fontsize=14)
plt.legend()
plt.show()


The above plot of data points after PCA was used for dimensionality reduction to 2 components shows a great separation between three different classes. Without the PCA, the plots such as below would represent the fact that classes ain’t separated clearly. This showcases the advantage of why PCA can be used for dimensionality reduction and a model trained with the transformed data will perform better than the original data.

#
# Create plot from original IRIS dataset
#
plt.figure(figsize=(8,6))

plt.scatter(df.iloc[0:50,0], df.iloc[0:50,1], color='green', marker='o', label='Setosa')
plt.scatter(df.iloc[50:100,0], df.iloc[50:100,1], color='blue', marker='s', label='Versicolour')
plt.scatter(df.iloc[100:150,0], df.iloc[100:150,1], color='red', marker='+', label='Virginica')

plt.title("Scatter plot without PCA transformation", fontsize=14)
plt.legend()
plt.show()


## How does LDA work?

Linear discriminant analysis (LDA) is another linear transformation technique that is used for dimensionality reduction. Unlike PCA, however, LDA is a supervised learning method, which means it takes class labels into account when finding directions of maximum variance. This makes LDA particularly well-suited for classification tasks where you want to maximize class separability.

As with PCA, LDA assumes that your data is centered around the origin and that your features are uncorrelated with one another. You can center and decorrelate your data using scikit-learn’s StandardScaler and LinearDiscriminantAnalysis classes, respectively. Once your data has been cleaned and transformed, you can fit an LDA model to it using scikit-learn’s fit_transform() method. This will return a projected version of your data that has been reduced to the desired number of dimensions while maximizing class separability. In the code below, the IRIS dataset is transformed into 2 components and scatter plot is created representing all the three classes such as Setosa, Versicolour and Virginica.

import pandas as pd
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
import matplotlib.pyplot as plt

from sklearn import datasets
#
#
#
# Create a dataframe from IRIS dataset
#
df = pd.DataFrame(iris.data, columns=["sepal_length", "sepal_width", "petal_length", "petal_width"])
df["class"] = iris.target
#
# Create LDA transformed dataset with dimensionality
# reduced to 2; n_components = 2
#
lda = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda.fit(df.iloc[:, 0:4], df.iloc[:, -1]).transform(df.iloc[:, 0:4])
#
# Create plot from transformed dataset
#
plt.figure(figsize=(8,6))

plt.scatter(X_lda[0:50,0], X_lda[0:50,1], color='green', marker='o', label='Setosa')
plt.scatter(X_lda[50:100,0], X_lda[50:100,1], color='blue', marker='s', label='Versicolour')
plt.scatter(X_lda[100:150,0], X_lda[100:150,1], color='red', marker='+', label='Virginica')

plt.title("LDA components plot for IRIS Dataset", fontsize=14)
plt.legend()
plt.show()


As like PCA transformation, LDA transformation also results in clear separation of IRIS dataset classes which would not have been possible with scatter plot on original dataset.

## LDA vs PCA: When to use which method?

PCA is an unsupervised learning algorithm while LDA is a supervised learning algorithm. This means that PCA finds directions of maximum variance regardless of class labels while LDA finds directions of maximum class separability.

So now that you know how each method works, when should you use PCA vs LDA for dimensionality reduction? In general, you should use LDA when your goal is classification – that is, when you have labels for your data points and want to predict which label new points will have based on their feature values . On the other hand, if you don’t have labels for your data or if your goal is simply to find patterns in your data (not classification), then PCA will likely work better .

That said, there are some situations where LDA may outperform PCA even when you’re not doing classification . For example , imagine that your data has 100 features but only 10% of those features are actually informative (the rest are noise). If you run PCA on this dataset, it will identify all 100 components since its goal is simply to maximize variance . However , because only 10% of those components are actually informative, 90% of them will be useless. If you were to run LDA on this same dataset, it would only identify 10 components since its goal capturing class separability would be better served by discarding noisy features. Thus, if noise dominates your dataset then LDA may give better results even if your goal isn’t classification! Because LDA makes stronger assumptions about the structure of your data, it will often perform better than PCA when your dataset satisfies those assumptions but worse when it doesn’t.

## Conclusion

So which technique should you use? That depends on what kind of data you’re working with and what your goals are. If you’re working with labeled data and your goal is to find a low-dimensional representation that maximizes class separability, then LDA is probably your best bet. However, if your goal is simply to find a low-dimensional representation that retains as much information as possible or if you’re working with unlabeled data, then PCA might be a better choice.

In general, it’s always worth trying both techniques on your dataset and seeing which one gives you the best results!