Support Vector Machines (SVMs) are a powerful and versatile machine learning algorithm that has gained widespread popularity among data scientists in recent years. SVMs are widely used for classification, regression, and outlier detection (one-class SVM), and have proven to be highly effective in solving complex problems in various fields, including computer vision (image classification, object detection, etc.), natural language processing (sentiment analysis, text classification, etc.), and bioinformatics (gene expression analysis, protein classification, disease diagnosis, etc.).
In this post, you will learn about the concepts of Support Vector Machine (SVM) with the help of Python code example for building a machine learning classification model. We will work with Python Sklearn package for building the model. As data scientists, it is important to get a good grasp on SVM algorithm and related aspects.
Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression tasks. At times, SVM for classification is termed as support vector classification (SVC) and SVM for regression is termed as support vector regression (SVR). In this post, we will learn about SVM classifier.
The main idea behind SVM classifier is to find a hyperplane that maximally separates the data points of different classes. In other words, we are looking for the largest margin between the data points belonging to two classes. The hyperplane is selected such that it maximizes the distance between the hyperplane and the closest data points from each class, which are called support vectors. These points have a direct impact on the position and orientation of the hyperplane. The rationale behind having decision boundaries with large margins is that they tend to have a lower generalization error, whereas models with small margins are more prone to overfitting. Hence, SVM classifier is also termed as maximum margin classifier, meaning that it finds the line or hyperplane that has the largest distance to the nearest training data points (support vectors) of any class. The following picture represents the same.
Given labeled training data (for supervised learning), the SVM classification algorithm outputs an optimal hyperplane which categorizes new data examples into different classes. This hyperplane is then used to make predictions on new data points. Let’s take an example to understand Support Vector Machine concepts in a better manner. Say you have been asked to predict whether a customer will churn or not and you have all their past transaction records as well as demographic information. After exploring the data, you’ve found that there’s not much difference between the average transaction amount of customers who churned and those who didn’t. You also found that most of the customers who churned live relatively far from the city center. Based on these findings, you decided to use Support Vector Machine classification algorithm to build your prediction model. Model trained using SVM classification algorithm will be able to classify the customers as high risk (churned) or otherwise.
There are some key concepts that are important to understand when working with SVMs. First, the data points that are closest to the hyperplane are called support vectors. Second, when the data is not linearly separable, SVMs use a technique called kernel trick, which maps the original features into a higher-dimensional feature space where the data is linearly separable. In this new feature space, we can draw a hyperplane that separates the two classes. Third, when working with Python SKlearn SVC algorithm, there are three hyperparameters that results in different SVM models (hypothesis): C, gamma and kernel function.
As an example, let’s say we have a dataset with two features (x1 and x2) and two classes (0 and 1). We can visualize this data by plotting it in a two-dimensional space, with each point colored according to its class label. Look at the diagram below.
In the above case, we can see that there are different straight lines that can perfectly separate the two classes. However, we can still find a decision boundary that does a pretty good job. This boundary is generated by Support Vector Machine algorithm. Using SVM algorithm, as mentioned above, training the model represents finding the hyperplane (dashed line in the picture below) which separates the data belonging to two different classes by maximum or largest margin. And, the points closest to this hyperplane are called support vectors. Note this in the diagram given below.
Support vector machines are a powerful tool for classification, but like any machine learning algorithm, they require careful tuning of their hyperparameters in order to achieve optimal performance. The most important hyperparameters are the kernel function and the regularization parameters such as C and gamma. The kernel function determines how data points are transformed into higher dimensional space, and the regularization parameter controls the trade-off between model complexity and overfitting. In addition, the Support Vector Machine also has a number of other important hyperparameters that can be adjusted to improve performance, including the maximum number of iterations, the tolerance for error, and the learning rate. By carefully tuning these hyperparameters, it is possible to achieve significantly better performance from a Support Vector Machine. Here are related post on tuning hyperparameters for building an optimal SVM model for classification:
The following steps will be covered for training the model using SVM while using Python code:
First and foremost we will load appropriate Sklearn modules and classes.
# Basic packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Sklearn modules & classes
from sklearn.linear_model import Perceptron, LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
from sklearn import metrics
Lets get started with loading the data set and creating the training and test split from the data set. Pay attention to the stratification aspect used when creating the training and test split. The train_test_split class of sklearn.model_selection is used for creating training and test split.
# Load the data set; In this example, the breast cancer dataset is loaded.
bc = datasets.load_breast_cancer()
X = bc.data
y = bc.target
# Create training and test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
Next step is to perform feature scaling. The reason for doing feature scaling is to make sure that data for different features are in the same range. The StandardScaler class of sklearn.preprocessing is used.
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
Next step is to instantiate a SVC (Support Vector Classifier) and fit the model. The SVC class of sklearn.svm module is used.
# Instantiate the Support Vector Classifier (SVC)
svc = SVC(C=1.0, random_state=1, kernel='linear')
# Fit the model
svc.fit(X_train_std, y_train)
Finally, it is time to measure the model performance. Here is the code for doing the same:
# Make the predictions
y_predict = svc.predict(X_test_std)
# Measure the performance
print("Accuracy score %.3f" %metrics.accuracy_score(y_test, y_predict))
The performance of the model will turn out to be 0.953.
In conclusion, Support Vector Machines (SVMs) are powerful learning models that can be used for both classification and regression tasks. SVMs work by mapping data points into higher dimensional feature space, which allows them to capture nonlinear relationships between features and perform complex classifications and regressions. Python provides several libraries for using SVMs efficiently such as scikit-learn and pySVM, making it easy to get started with SVM implementation in Python. By following the example in this guide, you now know more about what SVMs are capable of, how they work, and how to use them in practice.
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…
View Comments
So informative and clearly explained with code.
while predicting for an unseen data-point, do we also need to perform standardization of it before feeding it to the model?
Yes, if you have standardized (or normalized) your training data before training your models, you should also standardize any unseen data-points before making predictions. This is because the model has been trained on data with a certain scale and distribution, so it expects inputs with the same scale and distribution when making predictions.