# Category Archives: Data Science

## Scatter plot Matplotlib Python Example

If you’re a data scientist, data analyst or a Python programmer, data visualization is key part of your job. And what better way to visualize all that juicy data than with a scatter plot? Matplotlib is your trusty Python library for creating charts and graphs, and in this blog we’ll show you how to use it to create beautiful scatter plots using examples and with the help of Matplotlib library. So dig into your data set, get coding, and see what insights you can uncover! What is a Scatter Plot? A scatter plot is a type of data visualization that is used to show the relationship between two variables. Scatter …

## Why AI & Machine Learning Projects Fail?

AI / Machine Learning and data science projects are becoming increasingly popular for businesses of all sizes. Every organization is trying to leverage AI to further automate their business processes and gain competitive edge by delivering innovative solutions to their customers. However, many of these AI & machine learning projects fail due to various different reasons. In this blog post, we will discuss some of the reasons why AI / Machine Learning / Data Science projects fail, and how you can avoid them. The following are some of the reasons why AI / Machine learning projects fail: Lack of understanding of business problems / opportunities Ineffective solution design approaches Lack …

## Weight Decay in Machine Learning: Concepts

Weight decay is a popular technique in machine learning that helps to improve the accuracy of predictions. In this post, we’ll take a closer look at what weight decay is and how it works. We’ll also discuss some of the benefits of using weight decay and explore some possible applications. As data scientists, it is important to learn about concepts of weight decay as it helps in building machine learning models having higher generalization performance. Stay tuned! What is weight decay and how does it work? Weight decay is a regularization technique that is used to regularize the size of the weights of certain parameters in machine learning models. Weight …

## K-Fold Cross Validation – Python Example

In this post, you will learn about K-fold Cross-Validation concepts with Python code examples. K-fold cross-validation is a data splitting technique that can be implemented with k > 1 folds. K-Fold Cross Validation is also known as k-cross, k-fold cross-validation, k-fold CV, and k-folds. The k-fold cross-validation technique can be implemented easily using Python with scikit learn (Sklearn) package which provides an easy way to calculate k-fold cross-validation models. It is important to learn the concepts of cross-validation concepts in order to perform model tuning with the end goal to choose a model which has a high generalization performance. As a data scientist / machine learning Engineer, you must have a good …

## Model Complexity & Overfitting in Machine Learning

In machine learning, model complexity and overfitting are related in a manner that the model overfitting is a problem that can occur when a model is too complex due to different reasons. This can cause the model to fit the noise in the data rather than the underlying pattern. As a result, the model will perform poorly when applied to new and unseen data. In this blog post, we will discuss what model complexity is and how you can avoid overfitting in your machine learning models by handling the model complexity. As data scientists, it is of utmost importance to understand the concepts related to model complexity and how it …

## Softmax Regression Explained with Python Example

In this post, you will learn about the concepts of what is Softmax regression/function with Python code examples and why do we need them? As data scientist/machine learning enthusiasts, it is very important to understand the concepts of Softmax regression as it helps in understanding the algorithms such as neural networks, multinomial logistic regression, etc in a better manner. Note that the Softmax function is used in various multiclass classification machine learning algorithms such as multinomial logistic regression (thus, also called softmax regression), neural networks, etc. Before getting into the concepts of softmax regression, let’s understand what is softmax function. What’s Softmax function? Simply speaking, the Softmax function converts raw …

## Cross Entropy Loss Explained with Python Examples

In this post, you will learn the concepts related to the cross-entropy loss function along with Python code examples and which machine learning algorithms use the cross-entropy loss function as an objective function for training the models. Cross-entropy loss is used as a loss function for models which predict the probability value as output (probability distribution as output). Logistic regression is one such algorithm whose output is a probability distribution. You may want to check out the details on how cross-entropy loss is related to information theory and entropy concepts – Information theory & machine learning: Concepts What’s Cross-Entropy Loss? The cross-entropy loss function is an optimization function that is …

## Linear Regression Explained with Python Examples

In this post, you will learn about concepts of linear regression along with Python Sklearn examples for training linear regression models. Linear regression belongs to class of parametric models and used to train supervised models. The following topics are covered in this post: Introduction to linear regression Linear regression concepts / terminologies Linear regression python code example Introduction to Linear Regression Linear regression is a machine learning algorithm used to predict the value of continuous response variables. The predictive analytics problems that are solved using linear regression models are called supervised learning problems as it requires that the value of response/target variables must be present and used for training the models. Also, recall that …

## Normal Distribution Explained with Python Examples

What is normal distribution? It’s a probability distribution that occurs in many real world cases. In this blog post, you will learn about the concepts of Normal Distribution with the help of Python example. As a data scientist, you must get a good understanding of different probability distributions in statistics in order to understand the data in a better manner. Normal distribution is also called as Gaussian distribution or Laplace-Gauss distribution. Normal Distribution with Python Example Normal distribution is the default probability for many real-world scenarios. It represents a symmetric distribution where most of the observations cluster around the central peak called as mean of the distribution. A normal distribution can be thought of as a …

## Mean Squared Error or R-Squared – Which one to use?

In this post, you will learn about the concepts of the mean-squared error (MSE) and R-squared, the difference between them, and which one to use when evaluating the linear regression models. You also learn Python examples to understand the concepts in a better manner What is Mean Squared Error (MSE)? The Mean squared error (MSE) represents the error of the estimator or predictive model created based on the given set of observations in the sample. Intuitively, the MSE is used to measure the quality of the model based on the predictions made on the entire training dataset vis-a-vis the true label/output value. In other words, it can be used to …

## Linear Regression Explained with Real Life Example

In this post, the linear regression concept in machine learning is explained with multiple real-life examples. Both types of regression models (simple/univariate and multiple/multivariate linear regression) are taken up for sighting examples. In case you are a machine learning or data science beginner, you may find this post helpful enough. You may also want to check a detailed post on what is machine learning – What is Machine Learning? Concepts & Examples. What is Linear Regression? Linear regression is a machine learning concept that is used to build or train the models (mathematical models or equations) for solving supervised learning problems related to predicting continuous numerical value. Supervised learning problems …

## Tensor Broadcasting Explained with Examples

In this post, you will learn about the concepts of Tensor Broadcasting with the help of Python Numpy examples. Recall that Tensor is defined as the container of data (primarily numerical) most fundamental data structure used in Keras and Tensorflow. You may want to check out a related article on Tensor – Tensor explained with Python Numpy examples. Broadcasting of tensor is borrowed from Numpy broadcasting. Broadcasting is a technique used for performing arithmetic operations between Numpy arrays / Tensors having different shapes. In this technique, the following is done: As a first step, expand one or both arrays by copying elements appropriately so that after this transformation, the two tensors have the …

## Regularization in Machine Learning: Concepts & Examples

In machine learning, regularization is a technique used to avoid overfitting. This occurs when a model learns the training data too well and therefore performs poorly on new data. Regularization helps to reduce overfitting by adding constraints to the model-building process. As data scientists, it is of utmost importance that we learn thoroughly about the regularization concepts to build better machine learning models. In this blog post, we will discuss the concept of regularization and provide examples of how it can be used in practice. What is regularization and how does it work? Regularization in machine learning represents strategies that are used to reduce the generalization or test error of …

## Difference: Binary, Multiclass & Multi-label Classification

There are three main types of classification algorithms when dealing with machine learning classification problems: Binary, Multiclass, and Multilabel. In this blog post, we will discuss the differences between them and how they can be used to solve different problems. Binary classifiers can only classify data into two categories, while multiclass classifiers can classify data into more than two categories. Multilabel classifiers assign or tag the data to zero or more categories. Let’s take a closer look at each type! Binary classification & examples Binary classification is a type of supervised machine learning problem that requires classifying data into two mutually exclusive groups or categories. The two groups can be …

## Most Common Machine Learning Tasks

This article represents some of the most common machine learning tasks that one may come across while trying to solve machine learning problems. Also listed is a set of machine learning methods that could be used to resolve these tasks. Please feel free to comment/suggest if I missed mentioning one or more important points. Also, sorry for the typos. You might want to check out the post on what is machine learning?. Different aspects of machine learning concepts have been explained with the help of examples. Here is an excerpt from the page: Machine learning is about approximating mathematical functions (equations) representing real-world scenarios. These mathematical functions are also referred …

## Frequentist vs Bayesian Probability: Difference, Examples

In this post, you will learn about the difference between Frequentist vs Bayesian Probability. It is of utmost importance to understand these concepts if you are getting started with Data Science. What is Frequentist Probability? Probability is used to represent and reason about uncertainty. It was originally developed to analyze the frequency of the events. In other words, the probability was developed as frequentist probability. The probability of occurrence of an event, when calculated as a function of the frequency of the occurrence of the event of that type, is called Frequentist Probability. Frequentist probability is a way of assigning probabilities to events that take into account how often those events actually occur. Frequentist …