In this post, you will learn about one of the machine learning model tuning technique called Randomized Search which is used to find the most optimal combination of hyper parameters for coming up with the best model. The randomized search concept will be illustrated using Python Sklearn code example. As a data scientist, you must learn some of these model tuning techniques to come up with most optimal models. You may want to check some of the other posts on tuning model parameters such as the following:
In this post, the following topics will be covered:
Randomized Search is a yet another technique for sampling different hyper parameters combination in order to find the optimal set of parameters which will give the model with most optimal performance / score. As like Grid search, randomized search is the most widely used strategies for hyper-parameter optimization. Unlike Grid Search, randomized search is much more faster resulting in cost-effective (computationally less intensive) and time-effective (faster – less computational time) model training.
It is found that the randomized search is more efficient for hyper-parameter optimization than the grid search. Grid search experiments allocate too many trials to the exploration of dimensions that do not matter and suffer from poor coverage in dimensions that are important. Read this paper for more details – Random search for hyper parameter optimization.
In this post, randomized search is illustrated using sklearn.model_selection RandomizedSearchCV class while using SVC class from sklearn.svm package.
In this section, you will learn about how to use RandomizedSearchCV class for fitting and scoring the model. Pay attention to some of the following:
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
import scipy as sc
#
# Load the Sklearn breast cancer dataset
#
bc = datasets.load_breast_cancer()
X = bc.data
y = bc.target
#
# Create training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
#
# Create the pipeline estimator
#
pipeline = make_pipeline(StandardScaler(), SVC(random_state=1))
#
# Create parameter distribution using scipy.stats module
#
param_distributions = [{'svc__C': sc.stats.expon(scale=100),
'svc__gamma': sc.stats.expon(scale=.1),
'svc__kernel': ['rbf']},
{'svc__C': sc.stats.expon(scale=100),
'svc__kernel': ['linear']}]
#
# Create an instance of RandomizedSearchCV
#
rs = RandomizedSearchCV(estimator=pipeline, param_distributions = param_distributions,
cv = 10, scoring = 'accuracy', refit = True, n_jobs = 1,
random_state=1)
#
# Fit the RandomizedSearchCV estimator
#
rs.fit(X_train, y_train)
#
#
#
print('Test Accuracy: %0.3f' % rs.score(X_test, y_test))
One can find the best parameters, score using the following command:
#
# Print best parameters
#
print(rs.best_params_)
#
# Print the best score
#
print(rs.best_score_)
Here are some of the learning from this post on randomized search:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…