Tag Archives: sklearn
Linear Regression Python Examples

In this post, you will learn about concepts of linear regression along with Python Sklearn examples for training linear regression models. Linear regression belongs to a class of parametric models and is used to train supervised machine learning models. Introduction to Linear Regression Linear regression is a machine learning algorithm used to predict the value of continuous response variables. The predictive analytics problems that are solved using linear regression models are called supervised learning problems as they require that the value of response/target variables must be present and used for training the models. Also, recall that “continuous” represents the fact that the response variable is numerical in nature and can take infinite different values. …
K-Means Clustering Concepts & Python Example

Clustering is a popular unsupervised machine learning technique used in data analysis to group similar data points together. The K-Means clustering algorithm is one of the most commonly used clustering algorithms due to its simplicity, efficiency, and effectiveness on a wide range of datasets. In K-Means clustering, the goal is to divide a given dataset into K clusters, where each data point belongs to the cluster with the nearest mean value. The algorithm works by iteratively updating the cluster centroids until convergence is achieved. In this post, you will learn about K-Means clustering concepts with the help of fitting a K-Means model using Python Sklearn KMeans clustering implementation. You will …
Sklearn Algorithms Cheat Sheet with Examples

The Sklearn library, short for Scikit-learn, is one of the most popular and widely-used libraries for machine learning in Python. It offers a comprehensive set of tools for data analysis, preprocessing, model selection, and evaluation. As a beginner data scientist, it can be overwhelming to navigate the various algorithms and functions within Sklearn. This is where the Sklearn Algorithms Cheat Sheet comes in handy. This cheat sheet provides a quick reference guide for beginners to easily understand and select the appropriate algorithm for their specific task. In this cheat sheet, I have compiled a list of common supervised and unsupervised learning algorithms, along with their Sklearn classes and example use …
KMeans Silhouette Score Python Example

If you’re building machine learning models for solving different prediction problems, you’ve probably heard of clustering. Clustering is a popular unsupervised learning technique used to group data points with similar features into distinct clusters. One of the most widely used clustering algorithms is KMeans, which is popular due to its simplicity and efficiency. However, one major challenge in clustering is determining the optimal number of clusters that should be used to group the data points. This is where the Silhouette Score comes into play, as it helps us measure the quality of clustering and determine the optimal number of clusters. Silhouette score helps us get further clarity for the following …
Lasso Regression Explained with Python Example

Lasso regression, also known as L1 regularization, is a linear regression method that uses regularization to prevent overfitting and improve model performance. It works by adding a penalty term to the cost function that encourages the model to select only the most important features and set the coefficients of less important features to zero. This makes Lasso regression a popular method for feature selection and high-dimensional data analysis. In this post, you will learn concepts, advantages and limitations of Lasso regression along with Python Sklearn examples. The other two similar forms of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. What’s Lasso Regression? …
Dealing with Class Imbalance in Python: Techniques

As a data scientist, we are tasked with building machine learning (ML) models that can accurately predict outcomes based on input data. However, one of the biggest challenges in building ML models is dealing with class imbalance. Class imbalance occurs when the distribution of classes in your dataset is uneven, with one class significantly outnumbering one or more other classes. Class imbalance is a common problem in many industries, including fraud detection, medical diagnosis, and customer churn prediction, to name a few. Handling class imbalance correctly is crucial for data scientists, as it can have a significant impact on the performance of machine learning models. Failure to address class imbalance …
Python – Draw Confusion Matrix using Matplotlib

Classification models are a fundamental part of machine learning and are used extensively in various industries. Evaluating the performance of these models is critical in determining their effectiveness and identifying areas for improvement. One of the most common tools used for evaluating classification models is the confusion matrix. It provides a visual representation of the model’s performance by displaying the number of true positives, false positives, true negatives, and false negatives. In this post, we will explore how to create and visualize confusion matrices in Python using Matplotlib. We will walk through the process step-by-step and provide examples that demonstrate the use of Matplotlib in creating clear and concise confusion …
K-Fold Cross Validation – Python Example

In this post, you will learn about K-fold Cross-Validation concepts with Python code examples. K-fold cross-validation is a data splitting technique that can be implemented with k > 1 folds. K-Fold Cross Validation is also known as k-cross, k-fold cross-validation, k-fold CV, and k-folds. The k-fold cross-validation technique can be implemented easily using Python with scikit learn (Sklearn) package which provides an easy way to calculate k-fold cross-validation models. It is important to learn the concepts of cross-validation concepts in order to perform model tuning with the end goal to choose a model which has a high generalization performance. As a data scientist / machine learning Engineer, you must have a good …
Accuracy, Precision, Recall & F1-Score – Python Examples

Classification models are used in classification problems to predict the target class of the data sample. The classification model predicts the probability that each instance belongs to one class or another. It is important to evaluate the performance of the classifications model in order to reliably use these models in production for solving real-world problems. Performance measures in machine learning classification models are used to assess how well machine learning classification models perform in a given context. These performance metrics include accuracy, precision, recall, and F1-score. Because it helps us understand the strengths and limitations of these models when making predictions in new situations, model performance is essential for machine learning. …
K-Nearest Neighbors (KNN) Python Examples

If you’re working with data analytics projects including building machine learning (ML) models, you’ve probably heard of the K-nearest neighbors (KNN) algorithm. But what is it, exactly? And more importantly, how can you use it in your own AI / ML projects? In this post, we’ll take a closer look at the KNN algorithm and walk through a simple Python example. You will learn about the K-nearest neighbors algorithm with Python Sklearn examples. K-nearest neighbors algorithm is used for solving both classification and regression machine learning problems. Stay tuned! Introduction to K-Nearest Neighbors (K-NN) Algorithm K-nearest neighbors is a supervised machine learning algorithm for classification and regression. In both cases, the input consists …
Learning Curves Python Sklearn Example

In this post, you will learn about how to use learning curves using Python code (Sklearn) example to determine machine learning model bias-variance. Knowing how to use learning curves will help you assess/diagnose whether the model is suffering from high bias (underfitting) or high variance (overfitting) and whether increasing training data samples could help solve the bias or variance problem. You may want to check some of the following posts in order to get a better understanding of bias-variance and underfitting-overfitting. Bias-variance concepts and interview questions Overfitting/Underfitting concepts and interview questions What are learning curves & why they are important? Learning curve in machine learning is used to assess how models will …
Machine Learning Sklearn Pipeline – Python Example

In this post, you will learning about concepts about machine learning (ML) pipeline and how to build ML pipeline using Python Sklearn Pipeline (sklearn.pipeline) package. Getting to know how to use Sklearn.pipeline effectively for training/testing machine learning models will help automate various different activities such as feature scaling, feature selection / extraction and training/testing the models. It is recommended for data scientists (Python) to get a good understanding of Sklearn.pipeline. Introduction to Machine Learning Pipeline & Sklearn.pipeline Machine Learning (ML) pipeline, theoretically, represents different steps including data transformation and prediction through which data passes. The outcome of the pipeline is the trained model which can be used for making the predictions. …
Sklearn SimpleImputer Example – Impute Missing Data

In this post, you will learn about how to use Python’s Sklearn SimpleImputer for imputing / replacing numerical & categorical missing data using different strategies. In one of the related article posted sometime back, the usage of fillna method of Pandas DataFrame is discussed. Handling missing values is key part of data preprocessing and hence, it is of utmost importance for data scientists / machine learning Engineers to learn different techniques in relation imputing / replacing numerical or categorical missing values with appropriate value based on appropriate strategies. SimpleImputer Python Code Example SimpleImputer is a class in the sklearn.impute module that can be used to replace missing values in a dataset, using a …
Elbow Method vs Silhouette Score – Which is Better?

In K-means clustering, elbow method and silhouette analysis or score techniques are used to find the number of clusters in a dataset. The elbow method is used to find the “elbow” point, where adding additional data samples does not change cluster membership much. Silhouette score determines whether there are large gaps between each sample and all other samples within the same cluster or across different clusters. In this post, you will learn about these two different methods to use for finding optimal number of clusters in K-means clustering. Selecting optimal number of clusters is key to applying clustering algorithm to the dataset. As a data scientist, knowing these two techniques to find …
Bagging Classifier Python Code Example

Bagging is a type of ensemble machine learning approach that combines the outputs from many learner to improve performance. These algorithms function by breaking down the training set into subsets and running them through various machine-learning models, after which combining their predictions when they return together to generate an overall prediction for each instance in the original data. In this blog post, you will learn about the concept of Bagging along with Bagging Classifier Python code example. Bagging is commonly used in machine learning for classification problems, particularly when using decision trees or artificial neural networks as part of a boosting ensemble. It has been applied to various machine-learning algorithms including decision stumps, …
Gradient Boosting Regression Python Examples

In this post, you will learn about the concepts of Gradient Boosting Regression with the help of Python Sklearn code example. Gradient Boosting algorithm is one of the key boosting machine learning algorithms apart from AdaBoost and XGBoost. What is Gradient Boosting Regression? Gradient Boosting algorithm is used to generate an ensemble model by combining the weak learners or weak predictive models. Gradient boosting algorithm can be used to train models for both regression and classification problem. Gradient Boosting Regression algorithm is used to fit the model which predicts the continuous value. Gradient boosting builds an additive mode by using multiple decision trees of fixed size as weak learners or …
You can use citation styles as appropriate. Thank you Kumar, Ajitesh. "Two independent samples t-tests: Formula & Examples." Vitalflux.com, 22…