
Data scaling is an essential part of data analysis, especially when working with machine learning algorithms. Scaling helps to standardize the range of features and ensure that each feature (continuous variable) contributes equally to the analysis. Two popular scaling techniques used in Python are MinMaxScaler and StandardScaler.
In this blog, we will learn about the concepts and differences between these scaling techniques with the help of Python code examples, highlight their advantages and disadvantages, and provide guidance on when to use one over the other. Note that these are classes provided by sklearn.preprocessing module and used for feature scaling purposes. As a data scientist, you will need to learn these concepts in order to train machine learning models using algorithms that require features to be on the same scale.
Differences between MinMaxScaler and StandardScaler
Both MinMaxScaler and StandardScaler scale the data, but they use different methods to achieve this. MinMaxScaler scales the data to a fixed range, typically between 0 and 1. On the other hand, StandardScaler rescales the data to have a mean of 0 and a standard deviation of 1. This results in a distribution with zero mean and unit variance. The choice between MinMaxScaler and StandardScaler depends on the data distribution, the nature of the analysis, and the algorithm being used.
Here is the sample Pandas data frame which will be used later in this post for illustration purposes:
import pandas as pd
import numpy as np
arr = np.array([['M', 81.4, 82.2, 44, 6.1, 120000, 'no'],
['M', 75.2, 86.2, 40, 5.9, 80000, 'no'],
['F', 80.0, 83.2, 34, 5.4, 210000, 'yes'],
['F', 85.4, 72.2, 46, 5.6, 50000, 'yes'],
['M', 68.4, 87.2, 28, 5.11, 70000, 'no']])
#
# Create Pandas DataFrame
#
df = pd.DataFrame(arr)
df.columns = ['gender', 'hsc_p', 'ssc_p', 'age', 'height', 'salary', 'suffer_from_disease']
#
# Convert the string data type to int and float appropriately
#
df[['age', 'salary']] = df[['age', 'salary']].astype(int)
df[['ssc_p', 'hsc_p', 'height']] = df[['ssc_p', 'hsc_p', 'height']].astype(float)
Here is how the data frame looks like:

Why is Feature Scaling needed?
Feature scaling is about transforming the values of different numerical features to fall within a similar range like each other. The feature scaling is used to prevent the supervised learning models from getting biased toward a specific range of values. For example, if your model is based on linear regression and you do not scale features, then some features may have a higher impact than others which will affect the performance of predictions by giving undue advantage for some variables over others. This puts certain classes at disadvantage while training model. This is why it becomes important to use scaling algorithms so that you can standardize your feature values.
This process of feature scaling is done so that all features can share the same scale and hence avoid problems such as some of the following:
- Loss in accuracy
- Increase in computational cost as data values vary widely over different orders of magnitude.
For example, in the data set used in this post, pay attention to feature values of salary, age, and height. The values of salary are in the range of 50000 to 210000 (in the above example) while the values of age are in the range 1 to 100 and the values of height are in the range 4 ft to 7 ft. When such data set is applied on algorithms such as gradient descent optimization or K-nearest neighbors, the algorithm tries and find optimized weights or distances to handle feature values having larger values. This results in models which are sub-optimal in nature. This is where feature scaling comes into the picture. The idea is to transform the value of features in a similar range like others for machine learning algorithms to behave better resulting in optimal models.
Scaling the data to a common range can help to alleviate this problem. In this case, we can use the MinMaxScaler or StandardScaler to scale the data so that the features such as the salary, age and income contribute equally to the analysis. By scaling the data, we can ensure that each feature has an equal impact on the model’s performance, and the model can make more accurate predictions.
In addition to improving the performance of machine learning models, feature scaling can also help to speed up the convergence of optimization algorithms such as gradient descent. By scaling the data to a common range, the optimization algorithm can converge more quickly and efficiently.
Feature scaling is not important for algorithms such as random forest or decision trees which are scaling invariant. The scale of the value of the feature does not impact the model performance of models trained using these algorithms (random forest/decision tree).
Normalization vs Standardization
The two common approaches to bringing different features onto the same scale are normalization and standardization. Normalization concept is implemented in Python using MinMaxScaler and the standardization concept is implemented using StandardScaler.
What is Normalization?
Normalization refers to the rescaling of the features to a range of [0, 1], which is a special case of min-max scaling. To normalize the data, the min-max scaling can be applied to one or more feature columns. Here is the formula for normalizing data based on min-max scaling. Normalization is useful when the data is needed in the bounded intervals.

This is how the Python method would look like for normalizing one or more columns:
def normalize(values):
return (values - values.min())/(values.max() - values.min())
In order to apply the normalization technique to one or more feature columns, one could use the following Python code (with reference to the dataset used in this post). Note the usage of apply method which applies the normalize method shown above on multiple feature columns all at once.
cols = ['hsc_p', 'ssc_p', 'age', 'height', 'salary']
#
# Normalize the feature columns
#
df[cols] = df[cols].apply(normalize)
What is Standardization?
The standardization technique is used to center the feature columns at mean 0 with a standard deviation of 1 so that the feature columns have the same parameters as a standard normal distribution. Unlike Normalization, standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling, which scales the data to a limited range of values. Here is the formula for standardization.

This is how the Python method would look like for standardizing one or more columns:
def standardize(values):
return (values - values.mean())/values.std()
In order to apply the standardization techniques to one or more feature columns, one could use the following Python code (with reference to the dataset used in this post). Note the usage of apply method which applies the standardize method on multiple feature columns all at once.
cols = ['hsc_p', 'ssc_p', 'age', 'height', 'salary']
#
# Standardize the feature columns; Dataframe needs to be recreated for the following command to work properly.
#
df[cols] = df[cols].apply(standardize)
MinMaxScaler for Normalization
MinMaxScaler is a class from sklearn.preprocessing which is used for normalization. Here is the sample code:
from sklearn.preprocessing import MinMaxScaler
mmscaler = MinMaxScaler()
cols = ['hsc_p', 'ssc_p', 'age', 'height', 'salary']
df[cols] = mmscaler.fit_transform(df[cols])
In case of normalizing the training and test data set, the MinMaxScaler estimator will fit on the training data set and the same estimator will be used to transform both training and the test data set. The following code demonstrates the same assuming X consists of the training data set and y consists of corresponding labels. IRIS data set is used for illustration purposes.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
mmscaler = MinMaxScaler()
X_train_norm = mms.fit_transform(X_train)
X_test_norm = mms.transform(X_test)
StandardScaler for Standardization
StandardScaler is a class from sklearn.preprocessing which is used for standardization. Here is the sample code:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
cols = ['hsc_p', 'ssc_p', 'age', 'height', 'salary']
df[cols] = sc.fit_transform(df[cols])
In case of standardizing the training and test data set, the StandardScaler estimator will fit on the training data set and the same estimator will be used to transform both training and the test data set. The following code demonstrates the same. IRIS data set is used for illustration purposes.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
sc = StandardScaler()
X_train_norm = sc.fit_transform(X_train)
X_test_norm = sc.transform(X_test)
When to use MinMaxScaler or StandardScaler?
MinMaxScaler is useful when the data has a bounded range or when the distribution is not Gaussian. For example, in image processing, pixel values are typically in the range of 0-255. Scaling these values using MinMaxScaler ensures that the values are within a fixed range and contributes equally to the analysis. Similarly, when dealing with non-Gaussian distributions such as a power-law distribution, MinMaxScaler can be used to ensure that the range of values is scaled between 0 and 1.
StandardScaler is useful when the data has a Gaussian distribution or when the algorithm requires standardized features. For example, in linear regression, the features need to be standardized to ensure that they contribute equally to the analysis. Similarly, when working with clustering algorithms such as KMeans, StandardScaler can be used to ensure that the features are standardized and contribute equally to the analysis.
Conclusion
Here are some conclusions you can take away as the learning:
- Feature scaling is about transforming the value of features in the similar range like others for machine learning algorithms to behave better resulting in optimal models.
- Feature scaling is not required for algorithms such as random forest or decision tree
- Standardization and normalization are two most common techniques for feature scaling.
- Normalization is about transforming the feature values to fall within the bounded intervals (min and max)
- Standardization is about transforming the feature values to fall around mean as 0 with standard deviation as 1
- Standardization maintains useful information about outliers and makes the algorithm less sensitive to them in contrast to min-max scaling
- MinMaxScaler class of sklearn.preprocessing is used for normalization of features.
- StandardScaler class of sklearn.preprocessing is used for standardization of features.
Leave a Reply