In this post, you will learn about the hold out method used during the process of training machine learning model.
When evaluating machine learning (ML) models, the question that arises is whether the model is the best model available from the algorithm hypothesis space in terms of generalization error on the unseen / future data set. Whether the model is trained and tested using the most appropriate method. Out of available models, which model to select? These questions are taken care using what is called as hold out method.
Instead of using entire dataset for training, different sets called as validation set and test set is separated or set aside (and, thus, hold-out name) from the entire dataset and the model is trained only on what is termed as training dataset.
What is Hold-out method for training ML models?
The hold-out method for training machine learning model is the process of splitting the data in different splits and using one split for training the model and other splits for validating and testing the models. The hold-out method is used for both model evaluation and model selection.
When the entire data is used for training the model using different algorithms, the problem of evaluating the models and selecting the most optimal model remains. The primary ask is to find out which model out of all models has the lowest generalization error. In other words, which model makes better prediction on future or unseen dataset than all other models. This is where the need to have some mechanism arises wherein the model is trained on one data set and tested on another dataset. This is where the hold-out method comes into picture.
Hold-out method for Model Evaluation
Hold-out method for model evaluation represents the mechanism of splitting the dataset into training and test dataset and evaluating the model performance in order to get the most optimal model. The following represents the hold-out method for model evaluation.
In above diagram, you may note that the data set is split into two parts. One split is set aside or hold out for training the model. Another set is set aside or hold out for testing or evaluating the model. The split percentage is decided based on the volume of the data available for training purpose. Generally, 70-30% split is used for splitting the dataset where 70% of dataset is used for training and 30% dataset is used for testing the model.
This technique is well suited if the goal is to compare the models based on the model accuracy on the test dataset and select the best model. However, there is always a possibility that trying to use this technique can result in the model fitting well to the test dataset. In other words, the models are trained in order to improve model accuracy on the test dataset assuming that the test dataset represents the population. The test error, thus, becomes optimistically biased estimation of generalization error. However, that is not correct. The final model fails to generalize well to the unseen or future dataset as it is trained to fit well (or overfit) with respect to the test data.
The following is the process of using hold-out method for model evaluation:
- Split the dataset into two parts (preferably based on 70-30% split; However, the percentage split will vary)
- Train the model on the training dataset; While training the model, some fixed set of hyper parameters is selected.
- Test or evaluate the model on the held-out test dataset
- Train the final model on the entire dataset to get a model which can generalize better on the unseen or future dataset.
Note that this process is used for model evaluation based on splitting the dataset into training and test dataset and using a fixed set of hyper parameters. There is another technique of splitting the data into three sets and use these three sets for model selection or hyper parameters tuning. We will look at that technique in next section.
Hold-out method for Model Selection
The hold-out method can also be used for model selection or hyper parameters tuning. As a matter of fact, at times, the model selection process is referred to as hyper-parameters tuning. In hold-out method for model selection, the dataset is split into three different sets – training, validation and test dataset.
The following process represents hold-out method for model selection:
- Split the dataset in three parts – Training dataset, validation dataset and test dataset.
- Train different models using different machine learning algorithms. For example, train the classification model using logistic regression, random forest, XGBoost.
- For the models trained with different algorithms, tune the hyper-parameters and come up with different models. For each of the algorithms mentioned in step 2, change hyper parameters settings and come with multiple models.
- Test the performance of each of these models (belonging to each of the algorithms) on the validation dataset.
- Select the most optimal model out of models tested on the validation dataset. The most optimal model will have the most optimal hyper parameters settings for specific algorithm. Going by the above example, lets say the model trained with XGBoost with most optimal hyper parameters gets selected.
- Test the performance of the most optimal model on the test dataset.
The above can be understood using the following diagram. Note the three different split of the original dataset. The process of training, tuning and evaluation is repeated multiple times and the most optimal model is selected. The final model is evaluated on the test dataset.
Python Code for Training / Test Split
Here is the Python code which can be used to create the training and test split from the original dataset. In the code given below, Sklearn Boston housing dataset is used to demonstrated how train_test_split method from Sklearn.model_selection can be used to split the dataset into training and test dataset. Note that the test size is mentioned using the parameter, test_size.
from sklearn import datasets from sklearn.model_selection import train_test_split # # Load the Boston Dataset # bhp = datasets.load_boston() # # Create Training and Test Split # X_train, X_test, y_train, y_test = train_test_split(bhp.data, bhp.target, random_state=42, test_size=0.3)