In this post you will learn about a simple technique namely feature scaling using which you could improve machine learning models. The models will be trained using Perceptron (single-layer neural network) classifier.
First and foremost, lets quickly understand what is feature scaling and why one needs it?
What is Feature Scaling and Why does one need it?
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. This is performed when the dataset contains features that are highly varying in magnitudes, units and range.
In this post, we will learn to use Standardization technique for feature scaling. We will use the StandardScaler from sklearn.preprocessing package.
Train a Perceptron Model without Feature Scaling
Here is the code for training a model without feature scaling. First and foremost, lets load the dataset and create the dataset comprising of features and labels. In this post, the IRIS dataset has been used. In the below code, X is created as training data whose features are sepal length and petal length.
from sklearn import datasets iris = datasets.load_iris() X = iris.data[:, [0, 2]] Y = iris.target
Next step is to create the training and test split. The sklearn.model_selection module provides class train_test_split which couldbe used for creating the training / test split. Note that stratification is not used.
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
Next step is to create an instance of Perceptron classifier and train the model using X_train and Y_train dataset / label. The code below uses Perceptron class of sklearn.linear_model module.
from sklearn.linear_model import Perceptron prcptrn = Perceptron(eta0=0.1, random_state=1) prcptrn.fit(X_train, Y_train)
Next step is to measure the model accuracy. This can be measured using the class accuracy_score of sklearn.metrics module or calling score method on the Perceptron instance.
from sklearn.metrics import accuracy_score Y_predict = prcptrn.predict(X_test) print("Misclassified examples %d" %(Y_test != Y_predict).sum()) print("Accuracy Score %.3f" %accuracy_score(Y_test, Y_predict))
The accuracy score comes out to be 0.578 with number of misclassified example as 19.
Train a Perceptron Model with Feature Scaling
One does the feature scaling with the help of the following code. This step is followed just after creating training and test split.
from sklearn.preprocessing import StandardScaler sc = StandardScaler() sc.fit(X_train) X_train_std = sc.transform(X_train) X_test_std = sc.transform(X_test)
The above code represents StandardScaler class of sklearn.preprocessing module. The fit method of StandardScaler is used to estimate sample mean and standard deviation for each feature using training data. The transform method is then used to estimate the standardized value of features using those estimated parameters (mean & standard deviation).
The next step is to train a Perceptron model and measure the accuracy:
prcptrnFS = Perceptron(eta0=0.1, random_state=1) prcptrnFS.fit(X_train_std, Y_train) Y_predict_std = prcptrnFS.predict(X_test_std) print("Misclassified examples %d" %(Y_test != Y_predict_std).sum()) from sklearn.metrics import accuracy_score print("Accuracy Score %0.3f" % accuracy_score(Y_test, Y_predict_std))
The accuracy score comes out to be 0.978 with number of misclassified example as 1.
You can note that the accuracy score increased by almost 40%.
Thus, it is recommended to perform do feature scaling before training the model.
- Mean Squared Error or R-Squared – Which one to use? - September 30, 2020
- Linear Regression Explained with Python Examples - September 30, 2020
- Correlation Concepts, Matrix & Heatmap using Seaborn - September 29, 2020