In this post you will learn about a simple technique namely feature scaling with Python code examples using which you could improve machine learning models. The models will be trained using Perceptron (single-layer neural network) classifier.
First and foremost, lets quickly understand what is feature scaling and why one needs it?
What is Feature Scaling and Why does one need it?
Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization or standardization. Feature scaling is generally performed during the data pre-processing stage, before training models using machine learning algorithms. The goal is to transform the data so that each feature is in the same range (e.g. between 0 and 1). This ensures that no single feature dominates the others, and makes training and tuning quicker and more effective. Feature scaling can be accomplished using a variety of methods, including min-max scaling, z-score standardization, and decimal scaling. Which method you choose will depend on your data and your machine learning algorithm. For example, min-max scaling is typically used with neural networks, while z-score standardization is more common with linear regression models.
Consider a dataset with two features, age and salary. Age is usually distributed between 0 and 80 years, while salary is usually distributed between 0 and 1 million dollars. If we apply a machine learning algorithm to this dataset without feature scaling, the algorithm will give more weight to the salary feature since it has a much larger range. However, by rescaling both features to the range 0-1, we can give both features equal weight and improve the performance of our machine learning algorithm.
Feature scaling is performed when the dataset contains features that are highly varying in magnitudes, units, and ranges. The following is the details related to different kind of scaling as briefed above:
- Min-max scaling: Min-max scaling, also known as feature scaling, is a method used to standardize data before feeding it into a machine learning algorithm. The goal of min-max scaling is to ensure that all features are on a similar scale, which makes training the algorithm more efficient. For example, imagine we are training a machine learning model to predict house prices. If one of the features is the size of the house in square feet, we would want to make sure that this value is scaled appropriately before feeding it into the model. Otherwise, the model may place too much importance on this feature and produce inaccurate predictions. Min-max scaling can be used to achieve this goal by transforming all values so that they fall within a specific range (e.g., [0,1] or [-1,1]). The following is the formula for min-max scaling:
x_scaled = (x1 -x1_min)/(x1_max – x1_min)
- Z-score normalization or standardization: Z-score normalization, also known as Z-score standardization or mean-variance scaling, is a method of feature scaling that aims to rescale features so that they have a mean of zero and a standard deviation of one. This process can be useful for machine learning models that require features to be on the same scale in order to produce accurate results. For example, Z-score normalization is often used when training neural networks. Z-score normalization can be applied to data sets with any distribution; however, it is most effective when the data is Normally distributed. When Z-score normalization is applied to data that is not Normally distributed, it may compress some of the data points and expand others, which can impact the accuracy of machine learning models. Z-score normalization addresses the problem of outliers without requiring prior knowledge of what the reasonable range is by linearly scaling the input using the mean and standard deviation estimated over the training dataset. The following represents the formula for Z-score normalization. The same is implemented in StandardScaler whose usage is shown later in this post.
x_scaled = (x1 – x1_mean)/x1_stddev
The picture below represents the formula for both standardization and min-max scaling.
In this post, we will learn to use the Standardization (also known as z-score normalization) technique for feature scaling. We will use the StandardScaler from sklearn.preprocessing package.
Train a Perceptron Model without Feature Scaling
Here is the code for training a model without feature scaling. First and foremost, let’s load the dataset and create the dataset comprising of features and labels. In this post, the IRIS dataset has been used. In the below code, X is created as training data whose features are sepal length and petal length.
from sklearn import datasets iris = datasets.load_iris() X = iris.data[:, [0, 2]] Y = iris.target
Next step is to create the training and test split. The sklearn.model_selection module provides class train_test_split which couldbe used for creating the training / test split. Note that stratification is not used.
from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1)
Next step is to create an instance of Perceptron classifier and train the model using X_train and Y_train dataset / label. The code below uses Perceptron class of sklearn.linear_model module.
from sklearn.linear_model import Perceptron prcptrn = Perceptron(eta0=0.1, random_state=1) prcptrn.fit(X_train, Y_train)
Next step is to measure the model accuracy. This can be measured using the class accuracy_score of sklearn.metrics module or calling score method on the Perceptron instance.
from sklearn.metrics import accuracy_score Y_predict = prcptrn.predict(X_test) print("Misclassified examples %d" %(Y_test != Y_predict).sum()) print("Accuracy Score %.3f" %accuracy_score(Y_test, Y_predict))
The accuracy score comes out to be 0.578 with number of misclassified example as 19.
Train a Perceptron Model with Feature Scaling
One does the feature scaling with the help of the following code. This step is followed just after creating training and test split.
from sklearn.preprocessing import StandardScaler sc = StandardScaler() sc.fit(X_train) X_train_std = sc.transform(X_train) X_test_std = sc.transform(X_test)
The above code represents StandardScaler class of sklearn.preprocessing module. The fit method of StandardScaler is used to estimate sample mean and standard deviation for each feature using training data. The transform method is then used to estimate the standardized value of features using those estimated parameters (mean & standard deviation).
The next step is to train a Perceptron model and measure the accuracy:
prcptrnFS = Perceptron(eta0=0.1, random_state=1) prcptrnFS.fit(X_train_std, Y_train) Y_predict_std = prcptrnFS.predict(X_test_std) print("Misclassified examples %d" %(Y_test != Y_predict_std).sum()) from sklearn.metrics import accuracy_score print("Accuracy Score %0.3f" % accuracy_score(Y_test, Y_predict_std))
The accuracy score comes out to be 0.978 with the number of misclassified examples as 1.
You can note that the accuracy score increased by almost 40%.
Thus, it is recommended to perform do feature scaling before training the model.