Adaboost Algorithm Explained with Python Example

0

In this post, you will learn about boosting technique and adaboost algorithm with the help of Python example. You will also learn about the concept of boosting in general. Boosting classifiers are a class of ensemble-based machine learning algorithms which helps in variance reduction. It is very important for you as data scientist to learn both bagging and boosting techniques for solving classification problems. Check my post on bagging – Bagging Classifier explained with Python example for learning more about bagging technique. The following represents some of the topics covered in this post:

  • What is Boosting and Adaboost Algorithm?
  • Adaboost algorithm Python example

What is Boosting and Adaboost Algorithm?

As like bagging, Boosting is an ensemble method which makes use of a unique sampling technique for creating an ensemble classifier. In boosting technique, the data for the training is resampled and combined in an adaptive manner such that the weights in the resampling are increased for those data points which got mis-classified more often. In other words, the data points get combined to create new sample while assigning more weights to misclassified data points. Boosting is found to be more effective in variance reduction than bagging. The variance reduction comes from the aspect of adaptive resampling.

As like bagging, boosting technique is very effective for the classifiers which are found to have high variance. For example, decision tree classifier. For stable classifiers built using algorithm such as K-NN (K-nearest neighbours) or linear discriminant analysis (LDA) which are found to have low variance, bagging or boosting may not have much impact.

What is Adaptive Boosting?

Adaptive boosting (also called as AdaBoost) is one of the most commonly used implementation of boosting ensemble method. Adapting boosting combines (boosts) the weak learners to form a strong learner. Here is a diagram used for illustrating boosting classification technique:

Understanding Boosting Classification
Fig 1. Understanding Boosting Classification (Source: Python Machine Learning 3rd edition by Dr. Sebastian R)

From the above diagram, lets understand how adaptive boosting classifier works by ensembling three classifiers (classifier 1, 2 and 3).

  • Classifier 1 (clf1) classifies the points (pic 1). There are some points (blue points) that get misclassified.
  • The misclassified points are given more weights and weights of correctly classified points get reduced. A new classifier (clf2) is trained with new training data set having more weights assigned to the misclassified points and lesser weights to correctly classified points (thus, adaptive resampling). Look at pic 2 where misclassified points have given higher weight which is represented using larger circle and correctly classified points got reduced in size. New classifier (clf2) again results in misclassification of few points (3 blue points).
  • The misclassified points classified by clf2 gets higher weights (larger circle) and correctly classified points get lower weights (size reduced). Third classifier (clf3) gets retrained with new training datasets with weights for each data points updated. Take a look at picture 3.
  • Finally, the ensemble adaptive boosting classifier (clf4) is constructed by ensembling three classifiers constructed / fitted / trained using different training datasets created as a result of adaptive resampling. Take a look at picture 4 which represents the adaptive boosting (AdaBoost) classifier created by ensembling three classifiers.

Adaboost Algorithm Python Example

An AdaBoost classifier is an ensemble meta-estimator that is created using multiple versions of classifier trained using a base estimator. The first version of classifier gets trained on the original dataset. The later versions get trained on the same dataset but the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.

In this section, Sklearn.ensemble AdaBoostClassifier is used for illustrating the AdaBoost classifier. Two models have been fit for illustration purposes. One model is fit using DecisionTreeClassifier and other is fit using AdaBoostClassifier with base estimator used as DecisionTreeClassifier. You will see that the ensemble model trained using AdaBoostClasssifier has a higher accuracy and better generalization performance (test accuracy is greater than training accuracy).

Model fit using DecisionTreeClassifier

Here is the Python code for model which is fit using Sklearn.tree DecisionTreeClassifier. The tree is a decision stump with the max_depth set to 1.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import AdaBoostClassifier
#
# Load the breast cancer dataset
#
bc = datasets.load_breast_cancer()
X = bc.data
y = bc.target
#
# Create training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1, stratify=y)
#
# Pipeline Estimator
#
pipeline = make_pipeline(StandardScaler(), 
                        DecisionTreeClassifier(criterion='entropy', max_depth=1, random_state=1))
#
# Fit the model
#
pipeline.fit(X_train, y_train)
#
# Model scores on test and training data
#
print('Model test Score: %.3f, ' %pipeline.score(X_test, y_test), 
      'Model training Score: %.3f' %pipeline.score(X_train, y_train))
Fig 2. Model fit using DecisionTreeClassifier

The accuracy of the model comes out to be 91.6% for test data set and 91.5% for the training data set. Good model with with decent generalization performance. In the following section, we will see how does the model performance look like for model trained using AdaBoostClassifier.

Model fit using AdaBoostClassifier

Here is the code for model fit using sklearn.ensemble AdaBoostClassifier. Pay attention to some of the following:

  • Base estimator (base_estimator) is set to DecisionTreeClassifier.
  • Number of estimators is set to 100
  • Algorithm is set to SAMME. Another algorithm is SAMME.R
  • Learning rate is set to 0.005
#
# Standardize the dataset
#
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)
#
# Creating a decision tree classifier instance
#
dtree = DecisionTreeClassifier(criterion='entropy', max_depth=1, random_state=1)
#
# Instantiate the bagging classifier
#
adbclassifier = AdaBoostClassifier(base_estimator=dtree, 
                                   n_estimators=100, 
                                   learning_rate=0.0005,
                                   algorithm = 'SAMME',
                                   random_state=1)
#
# Fit the AdaBoost classifier
#
adbclassifier.fit(X_train, y_train)
#
# Model scores on test and training data
#
print('Model test Score: %.3f, ' %adbclassifier.score(X_test, y_test), 
      'Model training Score: %.3f' %adbclassifier.score(X_train, y_train))
Fig 3. Fit the model using AdaBoostClassifier

The accuracy of the model comes out to be 93.7% for test data set and 92.3% for the training data set. Better model with with better generalization performance than DecisionTreeClassifier.

Conclusions

In this post, you learned some of the following concepts in relation to boosting and adaboost algorithm:

  • Boosting technique is an adaptive resampling technique used for training different classifiers using modified training dataset based on assigning appropriate weights to misclassified data and correctly classified data.
  • AdaBoostClassifier makes use of boosting technique to assigning higher weights to misclassified data, lower weights to correctly classified data and sampling again to train the next classifier.
  • Boosting classification helps in reducing variance.
  • Boosting classifier suits well for unstable classifier – Classifier trained using algorithms such as DecisionTreeClassifier which results in classifier with high variance.
  • Boosting technique is found to reduce variance more than the Bagging classifier.
Ajitesh Kumar
Share.

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.