Python – Improve Model Performance using Feature Scaling

0

In this post you will learn about a simple technique namely feature scaling using which you could improve machine learning models. The models will be trained using Perceptron (single-layer neural network) classifier.

First and foremost, lets quickly understand what is feature scaling and why one needs it?

What is Feature Scaling and Why does one need it?

Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. This is performed when the dataset contains features that are highly varying in magnitudes, units and range.

In this post, we will learn to use Standardization technique for feature scaling. We will use the StandardScaler from sklearn.preprocessing package.

Train a Perceptron Model without Feature Scaling

Here is the code for training a model without feature scaling. First and foremost, lets load the dataset and create the dataset comprising of features and labels. In this post, the IRIS dataset has been used. In the below code, X is created as training data whose features are sepal length and petal length.

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
Y = iris.target

Next step is to create the training and test split. The sklearn.model_selection module provides class train_test_split which couldbe used for creating the training / test split. Note that stratification is not used. 

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1)

Next step is to create an instance of Perceptron classifier and train the model using X_train and Y_train dataset / label. The code below uses Perceptron class of sklearn.linear_model module.

from sklearn.linear_model import Perceptron

prcptrn = Perceptron(eta0=0.1, random_state=1)
prcptrn.fit(X_train, Y_train)

Next step is to measure the model accuracy. This can be measured using the class accuracy_score of sklearn.metrics module or calling score method on the Perceptron instance. 

from sklearn.metrics import accuracy_score
Y_predict = prcptrn.predict(X_test)
print("Misclassified examples %d" %(Y_test != Y_predict).sum())
print("Accuracy Score %.3f" %accuracy_score(Y_test, Y_predict))

The accuracy score comes out to be 0.578 with number of misclassified example as 19.

Train a Perceptron Model with Feature Scaling

One does the feature scaling with the help of the following code. This step is followed just after creating training and test split.

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

The above code represents StandardScaler class of sklearn.preprocessing module. The fit method of StandardScaler is used to estimate sample mean and standard deviation for each feature using training data. The transform method is then used to estimate the standardized value of features using those estimated parameters (mean & standard deviation).

The next step is to train a Perceptron model and measure the accuracy:

prcptrnFS = Perceptron(eta0=0.1, random_state=1)
prcptrnFS.fit(X_train_std, Y_train)

Y_predict_std = prcptrnFS.predict(X_test_std)
print("Misclassified examples %d" %(Y_test != Y_predict_std).sum())

from sklearn.metrics import accuracy_score
print("Accuracy Score %0.3f" % accuracy_score(Y_test, Y_predict_std))

The accuracy score comes out to be 0.978 with number of misclassified example as 1.

You can note that the accuracy score increased by almost 40%.

Thus, it is recommended to perform do feature scaling before training the model.

Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.