Category Archives: Data Science

Python – How to Plot Learning Curves of Classifier

May 20, 2020 by Ajitesh Kumar · Leave a comment

Perceptron Classifier Learning Curve using Python Mlxtend Package

In this post, you will learn a technique using which you could plot the learning curve of a machine learning classification model. As a data scientist, you will find the Python code example very handy. In this post, the plot_learning_curves class of mlxtend.plotting module from mlxtend package is used. This package is created by Dr. Sebastian Raschka. Lets train a Perceptron model using iris data from sklearn.datasets. The accuracy of the model comes out to be 0.956 or 95.6%. Next, we will want to see how did the learning go. In order to do that, we will use plot_learning_curves class of mlxtend.plotting module. Here is a post on how to install mlxtend with Anaconda. The following …

Continue reading →

Posted in Data Science, Machine Learning, Python. Tagged with Data Science, machine learning, python.

Infographics for Model & Algorithm Selection & Evaluation

May 19, 2020 by Ajitesh Kumar · Leave a comment

model evaluation model selection algorithm comparison

This is a short post created for quick reference on techniques which could be used for model evaluation & selection and model and algorithm comparision. This would be very helpful for those aspiring data scientists beginning to learn machine learning or those with advanced data science skills as well. The image has been taken from this blog, Comparing the performance of machine learning models and algorithms using statistical tests and nested cross-validation authored by Dr. Sebastian Raschka The above diagram provides prescription for what needs to be done in each of the following areas with small and large dataset. Very helpful, indeed. Model evaluation Model selection Model and algorithm comparison …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with ai, Data Science, machine learning.

Feature Scaling & Stratification for Model Performance (Python)

May 18, 2020 by Ajitesh Kumar · Leave a comment

In this post, you will learn about how to improve machine learning models performance using techniques such as feature scaling and stratification. The following topics are covered in this post. The concepts have been explained using Python code samples. What is feature scaling and why one needs to do it? What is stratification? Training Perceptron model without feature scaling and stratification Training Perceptron model with feature scaling Training Perceptron model with feature scaling and stratification What is Feature Scaling and Why is it needed? Feature scaling is a technique of standardizing the features present in the data in a fixed range. This is done when data consists of features of varying …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with Data Science, machine learning, python.

How to use Sklearn Datasets For Machine Learning

May 16, 2020 by Ajitesh Kumar · Leave a comment

In this post, you wil learn about how to use Sklearn datasets for training machine learning models. Here is a list of different types of datasets which are available as part of sklearn.datasets Iris (Iris plant datasets used – Classification) Boston (Boston house prices – Regression) Wine (Wine recognition set – Classification) Breast Cancer (Breast cancer wisconsin diagnostic – Classification) Digits (Optical recognition of handwritten digits dataset – Classification) Linnerud (Linnerrud dataset – Classification) Diabetes (Diabetes – Regression) The following command could help you load any of the datasets: All of the datasets come with the following and are intended for use with supervised learning: Data (to be used for …

Continue reading →

Posted in Data Science, Machine Learning. Tagged with Data Science, machine learning, python.

Python – How to install mlxtend in Anaconda

May 14, 2020 by Ajitesh Kumar · 4 Comments

Add Channel and Install Mlxtend using Conda Install

In this post, you will quickly learn about how to install mlxtend python package while you are working with Anaconda Jupyter Notebook. Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. This library is created by Dr. Sebastian Raschka, an Assistant Professor of Statistics at the University of Wisconsin-Madison focusing on deep learning and machine learning research. Here is the instruction for installing within your Anaconda. Add a channel namely conda-forge by clicking on Channels button and then Add button. Open a command prompt and execute the following command: conda install mlxtend –channel Conda-forge Once installed, launch a Jupyter Notebook and try importing the following. This should work …

Continue reading →

Posted in Data Science, Machine Learning, Python. Tagged with Data Science, machine learning, python.

Python DataFrame – Assign New Labels to Columns

May 13, 2020 by Ajitesh Kumar · Leave a comment

Python Dataframe Columns - Labels assigned new value

In this post, you will get a code sample related to how to assign new labels to columns in python programming while training machine learning models. This is going to be very helpful when working with classification machine learning problem. Many a time the labels for response or dependent variable are in text format and all one wants is to assign a number such as 0, 1, 2 etc instead of text labels. Beginner-level data scientists will find this code very handy. We will look at the code for the dataset as represented in the diagram below: In the above code, you will see that class labels are named as very_low, Low, High, Middle …

Continue reading →

Posted in AI, Data Science, Machine Learning, News. Tagged with Data Science, machine learning, python.

Java Implementation for Rosenblatt Perceptron

May 4, 2020 by Ajitesh Kumar · Leave a comment

In this post, you will learn about Java implementation for Rosenblatt Perceptron. Rosenblatt Perceptron is the most simplistic implementation of neural network. It is also called as single-layer neural network. The following diagram represents the Rosenblatt Perceptron: The following represents key aspect of the implementation which is described in this post: Method for calculating “Net Input“ Activation function as unit step function Prediction method Fitting the model Calculating the training & test error Method for calculating “Net Input” Net input is weighted sum of input features. The following represents the mathematical formula: $$Z = {w_0}{x_0} + {w_1}{x_1} + {w_2}{x_2} + … + {w_n}{x_n}$$ In the above equation, w0, w1, w2, …

Continue reading →

Posted in Data Science, Machine Learning. Tagged with Data Science, machine learning.

Difference between Adaline and Logistic Regression

May 1, 2020 by Ajitesh Kumar · Leave a comment

Logistic Sigmoid Activation Function Representation

In this post, you will understand the key differences between Adaline (Adaptive Linear Neuron) and Logistic Regression. Activation function Cost function Difference in Activation Function The primary difference is the activation function. In Adaline, the activation function is called as linear activation function while in logistic regression, the activation function is called as sigmoid activation function. The diagram below represents the activation functions for Adaline. The activation function for Adaline, also called as linear activation function, is the identity function which can be represented as the following: $$\phi(W^TX) = W^TX$$ The diagram below represents the activation functions for Logistic Regression. The activation function for Logistic Regression, also called as sigmoid activation function, is …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with Data Science, machine learning.

Logistic Regression: Sigmoid Function Python Code

May 1, 2020 by Ajitesh Kumar · 1 Comment

Logistic Regression - Sigmoid Function Plot

In this post, you will learn about the following: How to represent the probability that an event will take place with the asssociated features (attributes / independent features) Sigmoid function python code Probability as Sigmoid Function The below is the Logit Function code representing association between the probability that an event will occur and independent features. $$Logit Function = \log(\frac{P}{(1-P)}) = {w_0} + {w_1}{x_1} + {w_2}{x_2} + …. + {w_n}{x_n}$$ $$Logit Function = \log(\frac{P}{(1-P)}) = W^TX$$ $$P = \frac{1}{1 + e^-W^TX}$$ The above equation can be called as sigmoid function. Python Code for Sigmoid Function Executing the above code would result in the following plot: Pay attention to some of the …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with ai, Data Science, machine learning.

Three Key Challenges of Machine Learning Models

February 3, 2020 by Ajitesh Kumar · Leave a comment

In this post, you will learn about the three most important challenges or guiding principles that could be used while you are building machine learning models. The three key challenges which could be adopted while training machine learning models are following: The conflict between simplicity and accuracy Dimensionality – Curse or Blessing? The multiplicity of good models The Conflict between Simplicity and Accuracy Before starting on working for training one or more machine learning models, one would need to decide whether one would like to go for simple model or one would want to focus on model accuracy. The simplicity of models could be achieved by using algorithms which help …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with Data Science, machine learning.

Hypergeometric Distribution Explained with 10+ Examples

December 14, 2019 by Ajitesh Kumar · Leave a comment

In this post, we will learn Hypergeometric distribution with 10+ examples. The following topics will be covered in this post: What is Hypergeometric Distribution? 10+ Examples of Hypergeometric Distribution If you are an aspiring data scientist looking forward to learning/understand the binomial distribution in a better manner, this post might be very helpful. The Binomial distribution can be considered as a very good approximation of the hypergeometric distribution as long as the sample consists of 5% or less of the population. One would need a good understanding of binomial distribution in order to understand the hypergeometric distribution in a great manner. I would recommend you take a look at some of my related posts on …

Continue reading →

Posted in AI, Data Science, Machine Learning, statistics. Tagged with Data Science, statistics.

Binomial Distribution with Python Code Examples

December 14, 2019 by Ajitesh Kumar · Leave a comment

In this code, you will learn code examples, written with Python Numpy package, related to the binomial distribution. You may want to check out the post, Binomial Distribution explained with 10+ examples to get an understanding of Binomial distribution with the help of several examples. All of the examples could be tried with code samples given in this post. Here are the instructions: Load the Numpy package: First and foremost, load the Numpy and Seaborn library Code Syntax – np.random.binomial(n, p, size=1): The code np.random.binomial(n, p, size=1) will be used to print the number of successes that will happen in one (size=1) experiment comprising of n number of trials with probability/proportion of success being p. Tossing a …

Continue reading →

Posted in AI, Data Science, Machine Learning, statistics. Tagged with Data Science, statist.

Beta Distribution Example for Cricket Score Analysis

December 10, 2019 by Ajitesh Kumar · Leave a comment

virat kohli score probability using beta distribution

This post represents a real-world example of Binomial and Beta probability distribution from the sports field. In this post, you will learn about how the run scored by a Cricket player could be modeled using Binomial and Beta distribution. Ever wanted to predict the probability of Virat Kohli scoring a half-century in a particular match. This post will present a perspective on the same by using beta distribution to model the probability of runs that can be scored in a match. If you are a data scientist trying to understand beta and binomial distribution with a real-world example, this post will turn out to be helpful. First and foremost, let’s identify the random variable that we would like …

Continue reading →

Posted in AI, Data Science, Machine Learning, statistics. Tagged with ai, Data Science, machine learning, statistics.

How to Print Unique Values in Pandas Dataframe Columns

December 7, 2019 by Ajitesh Kumar · Leave a comment

print unique column values in Pandas dataframe

A quick post representing code sample on how to print unique values in Dataframe columns in Pandas. Here is a data frame comprising of oil prices on different dates which column such as year comprising of repeated/duplicate value of years. In the above data frame, the requirement is to print the unique value of year column. Here is the code for same. Note the method unique()

Posted in AI, Data Science, Machine Learning, News, Python. Tagged with Data Science, machine learning, python.

Pandas – How to Extract Month & Year from Datetime

December 7, 2019 by Ajitesh Kumar · Leave a comment

how to extract month and year from datetime

This is a quick post representing code sample related to how to extract month & year from datetime column of DataFrame in Pandas. The code sample is shown using the sample data, BrentOilPrices downloaded from this Kaggle data page. Here is the code to load the data frame. Check the data type of the data using the following code: The output looks like the following: Date object Price float64 dtype: object Use the following command to change the date data type from object to datetime and extract the month and year. Printing data using head command would print the following:

Posted in Data Science, Machine Learning, News. Tagged with ai, Data Science, machine learning.

Pandas – How to Concatenate Dataframe Columns

December 1, 2019 by Ajitesh Kumar · Leave a comment

Quick code sample on how to concatenate the data frames columns. We will work with example of Boston dataset found with sklearn.datasets. One should note that data frames could be concatenated by rows and columns. In this post, you will learn about how to concatenate data frames by columns. Here is the code for working with Boston datasets. First and foremost, the Boston dataset will be loaded. Once loaded, let’s create different different data frames comprising of data and target variable. This above creates two data frames comprising of data (features) and the values of target variable. Here are the snapshots. Use the following command to concatenate the data frames. …

Continue reading →

Posted in AI, Data Science, Machine Learning. Tagged with ai, artificial intelligence, datascience, machine learning.

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Category Archives: Data Science

Python – How to Plot Learning Curves of Classifier

Infographics for Model & Algorithm Selection & Evaluation

Feature Scaling & Stratification for Model Performance (Python)

How to use Sklearn Datasets For Machine Learning

Python – How to install mlxtend in Anaconda

Python DataFrame – Assign New Labels to Columns

Java Implementation for Rosenblatt Perceptron

Difference between Adaline and Logistic Regression

Logistic Regression: Sigmoid Function Python Code

Three Key Challenges of Machine Learning Models

Hypergeometric Distribution Explained with 10+ Examples

Binomial Distribution with Python Code Examples

Beta Distribution Example for Cricket Score Analysis

How to Print Unique Values in Pandas Dataframe Columns

Pandas – How to Extract Month & Year from Datetime

Pandas – How to Concatenate Dataframe Columns

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Recent Comments

Category Archives: Data Science

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Tag Cloud

Recent Comments