# Hold-out Method for Training Machine Learning Models

The hold-out method for training the machine learning models is a technique that involves splitting the data into different sets: one set for training, and other sets for validation and testing. The hold-out method is used to check how well a machine learning model will perform on the new data.  In this post, you will learn about the hold-out method used during the process of training the machine learning model. Do check out my post on what is machine learning? concepts & examples for a detailed understanding of different aspects related to the basics of machine learning. Also, check out a related post on what is data science?

When evaluating machine learning (ML) models, the question that arises is whether the model is the best model available from the model’s hypothesis space in terms of generalization error on the unseen / future data set. Whether the model is trained and tested using the most appropriate method. Out of available models, which model to select? These questions are taken care of using what is called as a hold-out method.

Instead of using an entire dataset for training, different sets called validation set and test set are separated or set aside (and, thus, hold-out name) from the entire dataset and the model is trained only on what is termed as the training dataset.

## What is the Hold-out method for training ML models?

The hold-out method for training a machine learning model is the process of splitting the data into different splits and using one split for training the model and other splits for validating and testing the models. The hold-out method is used for both model evaluation and model selection.

When the entire data is used for training the model using different algorithms, the problem of evaluating the models and selecting the most optimal model remains. The primary task is to find out which model out of all models has the lowest generalization error. In other words, which model makes a better prediction on future or unseen datasets than all other models. This is where the need to have some mechanism arises wherein the model is trained on one data set and tested on another dataset. This is where the hold-out method comes into the picture.

### Hold-out method for Model Evaluation

The hold-out method for model evaluation represents the mechanism of splitting the dataset into training and test datasets. The model is trained on the training set and then tested on the testing set to get the most optimal model. This approach is often used when the data set is small and there is not enough data to split into three sets (training, validation, and testing). This approach has the advantage of being simple to implement, but it can be sensitive to how the data is divided into two sets. If the split is not random, then the results may be biased. Overall, the hold out method for model evaluation is a good starting point for training machine learning models, but it should be used with caution. The following represents the hold-out method for model evaluation.

In the above diagram, you may note that the data set is split into two parts. One split is set aside or held out for training the model. Another set is set aside or held out for testing or evaluating the model. The split percentage is decided based on the volume of the data available for training purposes. Generally, 70-30% split is used for splitting the dataset where 70% of the dataset is used for training and 30% dataset is used for testing the model.

This technique is well suited if the goal is to compare the models based on the model accuracy on the test dataset and select the best model. However, there is always a possibility that trying to use this technique can result in the model fitting well to the test dataset. In other words, the models are trained to improve model accuracy on the test dataset assuming that the test dataset represents the population. The test error, thus, becomes an optimistically biased estimation of generalization error. However, that is not desired. The final model fails to generalize well to the unseen or future dataset as it is trained to fit well (or overfit) concerning the test data.

The following is the process of using the hold-out method for model evaluation:

• Split the dataset into two parts (preferably based on a 70-30% split; However, the percentage split will vary)
• Train the model on the training dataset; While training the model, some fixed set of hyperparameters is selected.
• Test or evaluate the model on the held-out test dataset
• Train the final model on the entire dataset to get a model which can generalize better on the unseen or future dataset.

Note that this process is used for model evaluation based on splitting the dataset into training and test datasets and using a fixed set of hyperparameters. There is another technique of splitting the data into three sets and using these three sets for model selection or hyperparameters tuning. We will look at that technique in the next section.

### Hold-out method for Model Selection

The hold-out method can also be used for model selection or hyperparameters tuning. As a matter of fact, at times, the model selection process is referred to as hyper-parameters tuning. In the hold-out method for model selection, the dataset is split into three different sets – training, validation, and test dataset. When using the hold out method by splitting data into three different sets, it is important to ensure that the training, validation and test datasets are representative of the entire dataset. Otherwise, the model may perform poorly on unseen data.

The following process represents the hold-out method for model selection:

1. Split the dataset in three parts – Training dataset, validation dataset and test dataset.
2. Train different models using different machine learning algorithms. For example, train the classification model using logistic regression, random forest, XGBoost, etc.
3. For the models trained with different algorithms, tune the hyper-parameters and come up with different models. For each of the algorithms mentioned in step 2, change hyperparameters settings and come with multiple models.
4. Test the performance of each of these models (belonging to each of the algorithms) on the validation dataset.
5. Select the most optimal model out of the models tested on the validation dataset. The most optimal model will have the most optimal hyperparameters settings for a specific algorithm. Going by the above example, let’s say the model trained with XGBoost with the most optimal hyperparameters gets selected.
6. Test the performance of the most optimal model on the test dataset.

The above can be understood using the following diagram. Note the three different splits of the original dataset. The process of training, tuning, and evaluation is repeated multiple times, and the most optimal model is selected. The final model is evaluated on the test dataset.

## Python Code for Training / Test Split

Here is the Python code which can be used to create the training and test split from the original dataset. In the code given below, the Sklearn Boston housing dataset is used to demonstrate how the train_test_split method from Sklearn.model_selection can be used to split the dataset into training and test dataset. Note that the test size is mentioned using the parameter, test_size.

from sklearn import datasets
from sklearn.model_selection import train_test_split
#
#
#
# Create Training and Test Split
#
X_train, X_test, y_train, y_test = train_test_split(bhp.data, bhp.target, random_state=42, test_size=0.3)


## Different types of Hold-out methods

Based on the fundamental techniques discussed in the previous section, there are different types of hold-out methods that are used to improve the machine learning model accuracy by avoiding overfitting or underfitting of the model. The following is the list of some of them:

• K-fold Cross-validation hold out method: In the cross-validation hold out method, the following steps are followed:
• The data set is divided into training sets (training, validation) and test sets (test).
• The machine learning model is developed using a portion of data and then tested on the rest of the data
• This process is repeated K times with different random partitioning to generate an average performance measure from K machine learning models. For each machine learning model training, one sample from the data set is left out (called as test data set) and machine learning model tries to predict its value on this test data set. This process is repeated until all samples have been predicted in at least once by machine learning model. Check out the detail in my post, K-fold cross validation – Python examples
• Leave One Out Cross Validation Method: In leave one out cross validation method, one observation is left out and machine learning model is trained using the rest of data. This process is repeated multiple times (until entire data is covered) with different random partitioning to generate an average performance measure.

Hold-out methods are machine learning techniques that can be used to avoid overfitting or underfitting machine learning models. The cross-validation hold out method is one of the most popular utilized types, where a machine learning model will first train using a portion of data, and then it will be tested on what’s left. Leave-one-out cross-validation is another technique that helps avoid these pitfalls by leaving one observation as a test case while training with the rest of the data. If you would like to learn more, please send your queries.