In this post, you will learn about the key differences between AdaBoost classifier and Random Forest algorithm. As data scientists, you must get a good understanding of the differences between Random Forest and AdaBoost machine learning algorithm. Both algorithms can be used for both regression and classification problems.
Both Random Forest and AdaBoost algorithm is based on creation of Forest of trees. They are called as ensemble learning algorithms. Random forest is created using a bunch of decision trees which make use of different variables or features and makes use of bagging techniques for data sample. In AdaBoost, the forest is created using a bunch of what is called as decision stump. Decision stumps are nothing but decision trees with one node and two leaves. AdaBoost algorithm can be said to make decision using a bunch of decision stumps. Here are different posts on Random forest and AdaBoost.
Models trained using both Random forest and AdaBoost classifier make predictions which generalises better with larger population. The models trained using both algorithms are less susceptible to overfitting / high variance.
Differences between AdaBoost vs Random Forest
Here are the key differences between AdaBoost and Random Forest algorithm:
- Data sampling (Bagging vs Boosting): In Random forest, the training data is sampled based on bagging technique. Bagging technique is a data sampling technique which decreases the variance in the prediction by generating additional data for training from dataset using combinations with repetitions to produce multi-sets of the original data. In AdaBoost, the training data used for training subsequent decision stumps (trees with one node and two leaves) have few data samples assigned higher weights based on miss-classification of those data set in the previous decision stump. The very fact that few data samples which are mis-classified is assigned higher weights result in those data sets will get sampled repeatedly in the new data sample.
- Decision Trees vs Decision Stumps: Random forest makes use of multiple full-size decision trees or multiple decision trees having different depth. These decision trees make use of multiple variables to do final classification of a data point. On the other hand, AdaBoost makes use of what is called as decision stumps. Decision stumps are decision trees with one node and two leaves. AdaBoost makes use of multiple decision stumps with each decision stump built on just one variable or feature. This is unlike random forest in which decision trees make use of multiple variables to make a final classification decision. Here are the diagrams representing decision trees used in random forest vs decision stumps used in AdaBoost algorithm.
- Equal Weights vs Variable Weights: In Random forest, decision made by each tree carries equal weight. In other words, each decision tree has equal say or weight in final decision. In AdaBoost, some decision stumps may have higher say or weight in final decision than the others.
- Tree ordering: In Random forest, each decision tree is made independently of other trees. The ordering in which decision trees are created are not important at all. However, in the forest of stumps made in AdaBoost, the ordering in which decision stumps are created is important. The errors made in first decision stump influence how the second decision stump is made and the error made in second stump influences how the third decision stump is made.