# Author Archives: Ajitesh Kumar

## MIT Free Course on Machine Learning (New)

In this post, the information regarding new free course on machine learning launched by MIT OpenCourseware. In case, you are a beginner data scientist or ML Engineer, you will find this course to be very useful. Here is the URL to the free course on machine learning: https://bit.ly/37iNNAA. This course, titled as Introduction to Machine Learning, introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction. It includes formulation of learning problems and concepts of representation, over-fitting, and generalization. These concepts are exercised in supervised learning and reinforcement learning, with applications to images and to temporal sequences. Here are some of the key topics for which lectures can be found: …

## Gradient Boosting Regression Python Examples

In this post, you will learn about the concepts of Gradient Boosting Regression with the help of Python Sklearn code example. Gradient Boosting algorithm is one of the key boosting machine learning algorithms apart from AdaBoost and XGBoost. What is Gradient Boosting Regression? Gradient Boosting algorithm is used to generate an ensemble model by combining the weak learners or weak predictive models. Gradient boosting algorithm can be used to train models for both regression and classification problem. Gradient Boosting Regression algorithm is used to fit the model which predicts the continuous value. Gradient boosting builds an additive mode by using multiple decision trees of fixed size as weak learners or …

## Differences between Random Forest vs AdaBoost

In this post, you will learn about the key differences between AdaBoost classifier and Random Forest algorithm. As data scientists, you must get a good understanding of the differences between Random Forest and AdaBoost machine learning algorithm. Both algorithms can be used for both regression and classification problems. Both Random Forest and AdaBoost algorithm is based on creation of Forest of trees. They are called as ensemble learning algorithms. Random forest is created using a bunch of decision trees which make use of different variables or features and makes use of bagging techniques for data sample. In AdaBoost, the forest is created using a bunch of what is called as decision …

## Classification Problems Real-life Examples

In this post, you will learn about some popular and most common real-life examples of machine learning classification problems. For beginner data scientists, these examples will prove to be helpful to gain perspectives on real-world problems which can be termed as machine learning classification problems. This post will be updated from time-to-time to include interesting real-life examples which can be solved by training machine learning classification models. Before going ahead and looking into examples, let’s understand a little about what is machine learning (ML) classification problem. You may as well skip this section if you are familiar with the definition of machine learning classification problems & solutions. What are ML …

## Data Quality Challenges for Analytics Projects

In this post, you will learn about some of the key data quality challenges which you may need to tackle with, if you are working on data analytics projects or planning to get started on data analytics initiatives. If you represent key stakeholders in analytics team, you may find this post to be useful in understanding the data quality challenges. Here are the key challenges in relation to data quality which when taken care would result in great outcomes from analytics projects related to descriptive, predictive and prescriptive analytics: Data accuracy / validation Data consistency Data availability Data discovery Data usability Data SLA Cos-effective data Data Accuracy One of the most important …

## Data Science vs Data Engineering Team – Have Both?

In this post, you will learn about different aspects of data science and data engineering team and also understand the key differences between them. As data science / engineering stakeholders, it is very important to understand whether we need to have one or both the teams to achieve high quality dataset & data pipelines as well as high-performant machine learning models. Background When an organization starts on the journey of building data analytics products, primarily based on predictive analytics, it goes on to set up a centralized (mostly) data science team consisting of data scientists. The data science team works with the product team or multiple product teams to gather the …

## 500+ Machine Learning Interview Questions

This post consists of all the posts on this website in relation to interview questions / quizzes related to data science / machine learning topics. These questions can prove to be helpful for the following: Product managers Data scientists Product Managers Interview Questions Find the questions for product managers on this page – Machine learning interview questions for product managers Data Scientists Interview Questions Here are posts representing 500+ interview questions which will be helpful for data scientists / machine learning engineers. You will find it useful as practise questions and answers while preparing for machine learning interview. Decision tree questions Machine learning validation techniques questions Neural networks questions – …

## Spacy Tokenization Python Example

In this post, you will quickly learn about how to use Spacy for reading and tokenising a document read from text file or otherwise. As a data scientist starting on NLP, this is one of those first code which you will be writing to read the text using spaCy. First and foremost, make sure you have got set up with Spacy, and, loaded English tokenizer. The following commands help you set up in Jupyter notebook. Reading text using spaCy: Once you are set up with Spacy and loaded English tokenizer, the following code can be used to read the text from the text file and tokenize the text into words. Pay attention …

## Top 10 Types of Analytics Projects – Examples

In this post, you will learn about some of the most common types of data analytics projects which can be executed by the organization to realise associated business value from analytics projects and, also, gain competitive advantage with respect to the related business functions. Note that analytics projects are different from AI / ML projects. AI / ML or predictive analytics is one part of analytics. Other types of analytics projects include those related with descriptive and prescriptive analytics. You may want to check out one of my related posts on difference between predictive and prescriptive analytics. Here are the key areas of focus for data analytics projects: Cost reduction: …

## Different Success / Evaluation Metrics for AI / ML Products

In this post, you will learn about some of the common success metrics which can be used for measuring the success of AI / ML (machine learning) / DS (data science) initiatives / products. If you are one of the AI / ML stakeholders, you would want to get hold of these metrics in order to apply right metrics in right business use cases. Business leaders do want to know and maximise the return on investments (ROI) from AI / ML investments. Here is the list of success metrics for AI / DS / ML initiatives: Business value metrics / Key performance indicators (KPIs): Business value metrics such as operating …

## Predictive vs Prescriptive Analytics Difference

In this post, you will quickly learn about the difference between predictive analytics and prescriptive analytics. As data analytics stakeholders, one must get a good understanding of these concepts in order to decide when to apply predictive and when to make use of prescriptive analytics in analytics solutions / applications. Without further ado, let’s get straight to the diagram. In the above diagram, you could observe / learn the following: Predictive analytics: In predictive analytics, the model is trained using historical / past data based on supervised, unsupervised, reinforcement learning algorithms. Once trained, the new data / observation is input to the trained model. The output of the model is prediction in form …

## Analytics Maturity Model for Assessing Analytics Practice

In this post, you will learn about data analytics maturity model which you could use to assess where does your business / organization stand on the path of using analytics to drive business value. If you represent decision-making stakeholders group and want to assess your organization readiness / capabilities to deploy analytics in order to create business value creation, you may find this post useful enough. Here is a list of other articles I posted in the recent past in relation to strategic data analytics: Top 10 analytics strategies for great data products Top 5 data analytics methodologies Here are the three broad categories / levels of data analytics maturity model: Analytically …

## Fixed vs Random vs Mixed Effects Models – Examples

In this post, you will learn about the concepts of fixed and random effects models along with when to use fixed effects models and when to go for fixed + random effects (mixed) models. The concepts will be explained with examples. As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as general linear mixed models or generalized linear mixed models (GLMM). The following are some of the topics covered in this post: What are fixed, random & mixed effects models? When to use fixed effects vs mixed effects models? What are fixed, random & mixed effects models? First, we will take a real world …

## Hierarchical Clustering Explained with Python Example

In this post, you will learn about the concepts of Hierarchical clustering with the help of Python code example. As data scientist / machine learning enthusiasts, you would want to learn the concepts of hierarchical clustering in a great manner. The following topics will be covered in this post: What is hierarchical clustering? Hierarchical clustering Python example What is Hierarchical Clustering? Hierarchical clustering is an unsupervised learning algorithm which is based on clustering data based on hierarchical ordering. Recall that clustering is an algorithm which groups data points within multiple clusters such that data within each cluster are similar to each other while clusters are different each other. The hierarchical clustering can be classified …

## Negative Binomial Distribution Python Examples

In this post, you will learn about the concepts of negative binomial distribution explained using real-world examples and Python code. We will go over some of the following topics to understand negative binomial distribution: What is negative binomial distribution? What is difference between binomial and negative binomial distribution? Negative binomial distribution real-world examples Negative binomial distribution Python example What is Negative Binomial Distribution? Negative binomial distribution is a discrete probability distribution representing the probability of random variable, X, which is number of Bernoulli trials required to have r number of successes. This random variable is called as negative binomial random variable. And, the experiment representing X number of Bernoulli trials required to product r successes is called …

## Generalized Linear Models Explained with Examples

In this post, you will learn about the concepts of generalized linear models (GLM) with the help of Python examples. It is very important for data scientists to understand the concepts of generalized linear models and how are they different from general linear models such as regression or ANOVA models. Some of the following topics have been covered in this post: What are generalized linear models (GLM)? Generalized linear models real-world examples When to use generalized linear models? What are Generalized Linear Models? Generalized linear models represent the class of regression models which models the response variable, Y, and the random error term () based on exponential family of distributions such as normal, Poisson, …