Different Types of Distance Measures in Machine Learning

Euclidean Distance formula

In this post, you will learn different types of distance measures used in different machine learning algorithms such as K-nearest neighbours, K-means etc. Distance measures are used to measure the similarity between two or more vectors in multi-dimensional space. The following represents different forms of distance metrics / measures: Geometric distances Computational distances Statistical distances Geometric Distance Measures Geometric distance metrics, primarily, tends to measure the similarity between two or more vectors solely based on the distance between two points in multi-dimensional space. The examples of such type of geometric distance measures are Minkowski distance, Euclidean distance and Manhattan distance. One other different form of geometric distance is cosine similarity which will discuss …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Introduction to Algorithms & Related Computational Tasks

Sample-Directed-Acyclic-Graph

In this post, you will be introduced to some of the important class of algorithms and related computational tasks which could be taken care using these algorithms.  Here are some important classes of algorithms which will be briefly discussed in this post: Divide and conquer algorithms Graphs based algorithms Greedy algorithms Dynamic programming Linear programming NP-complete algorithms Quantum algorithms Divide-and-Conquer Algorithms Divide and conquer algorithms are the algorithms which can be used to solve problems using divide and conquer strategy. The following represents the steps of divide-and-conquer algorithms: Breaking it into subproblems that are themselves smaller instances of the same type of problem Recursively solving these subproblems Appropriately combining their …

Continue reading

Posted in Algorithms. Tagged with .

Hold-out Method for Training Machine Learning Models

Hold-out-method-Training-Validation-Test-Dataset

In this post, you will learn about the hold out method used during the process of training machine learning model. When evaluating machine learning (ML) models, the question that arises is whether the model is the best model available from the algorithm hypothesis space in terms of generalization error on the unseen / future data set. Whether the model is trained and tested using the most appropriate method. Out of available models, which model to select? These questions are taken care using what is called as hold out method. Instead of using entire dataset for training, different sets called as validation set and test set is separated or set aside …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Machine Learning Terminologies for Beginners

ML Terminologies Hypothesis Space

When starting on the journey of learning machine learning and data science, we come across several different terminologies when going through different articles/posts, books & video lectures. Getting a good understanding of these terminologies and related concepts will help us understand these concepts in a nice manner. At a senior level, it gets tricky at times when the team of data scientists / ML engineers explain their projects and related outcomes. With this in context, this post lists down a set of commonly used machine learning terminologies that will help us get a good understanding of ML concepts and also engage with the DS / AI / ML team in …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Bias & Variance Concepts & Interview Questions

Bias variance concepts and interview questions

In this post, you will learn about the the concepts of bias & variance in relation to the machine learning (ML) models. In addition to learning the concepts, you would also get a chance to take quiz which would help you prepare for data scientists / ML Engineer interviews. As data scientists / ML Engineer, you must get a good understanding of Bias and Variance concepts in order to build models that generalizes in a better manner or have lower generalization error. Bias & Variance of Machine Learning Models Bias of the model, intuitively speaking, can be defined as affinity of the model to make predictions or estimate based on only …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Machine Learning Free Course at Univ Wisconsin Madison

Dr Sebastian Raschka Machine Learning Course

In this post, you will learn about the free course on machine learning (STAT 451) recently taught at University of Wisconsin-Madison by Dr. Sebastian Raschka. Dr. Sebastian Raschka in currently working as an assistant Professor of Statistics at the University of Wisconsin-Madison while focusing on deep learning and machine learning research. The course is titled as “Introduction to Machine Learning”. The recording of the course lectures can be found on the page – Introduction to machine learning. The course covers some of the following topics: What is machine learning? Nearest neighbour methods Computational foundation Python Programming (concepts) Machine learning in Scikit-learn Tree-based methods Decision trees Ensemble methods Model evaluation techniques Concepts of …

Continue reading

Posted in Data Science, Machine Learning, Online Courses. Tagged with , , .

Overfitting & Underfitting Concepts & Interview Questions

Overfitting and underfitting represented using Model error vs complexity plot

In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test you understanding by attempting the quiz. The quiz will help you prepare well for interview questions in relation to underfitting & overfitting. As data scientists, you must get a good understanding of the overfitting and underfitting concepts.  Introduction to Overfitting & Underfitting Assuming independent and identically distributed (I.I.d) dataset, when the prediction error on both the training and test dataset is high, the model is said to have underfit. This is called as underfitting the model or model …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Reinforcement Learning Real-world examples

Reinforcement-learning-real-world-example

In this post, you will learn about some real-world / real-life examples of Reinforcement learning, one of the different approaches to machine learning where other approaches are supervised and unsupervised learning. Before looking into the real-world examples of Reinforcement learning, let’s quickly understand what is reinforcement learning. Introduction to Reinforcement Learning (RL) Reinforcement learning is an approach to machine learning in which the agents are trained to make a sequence of decisions. The agent, also called as an AI agent gets trained in the following manner: The agent interacts with the environment and make decisions or choices. For training purpose, the agent is provided with the contextual information about the environment and …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Starting on Analytics Journey – Things to Keep in Mind

Analytics Journey - Things to Keep in Mind

This post highlights some of the key points to keep in mind when you are starting on data analytics journey. You may want to check a related post to assess where does your organization stand in terms of maturity of analytics practice – Analytics maturity model for assessing analytics practice. In the post sighted above, the analytics maturity model defines three different levels of maturity which are as following: Challenged Practitioners Innovators At whichever level you are in terms of maturity of your analytics practice, it may be good idea to understand the following points to come up with data analytics projects. Believe that a lot of prior work is required …

Continue reading

Posted in Analytics. Tagged with .

MIT Free Course on Machine Learning (New)

MIT Free Course on Machine Learning

In this post, the information regarding new free course on machine learning launched by MIT OpenCourseware. In case, you are a beginner data scientist or ML Engineer, you will find this course to be very useful.  Here is the URL to the free course on machine learning: https://bit.​ly/37iNNAA. This course, titled as Introduction to Machine Learning, introduces principles, algorithms, and applications of machine learning from the point of view of modeling and prediction. It includes formulation of learning problems and concepts of representation, over-fitting, and generalization. These concepts are exercised in supervised learning and reinforcement learning, with applications to images and to temporal sequences. Here are some of the key topics for which lectures can be found: …

Continue reading

Posted in Career Planning, Data Science, Machine Learning, Tutorials. Tagged with , , .

Gradient Boosting Regression Python Examples

Gradient Boosting Regressor Feature Importances

In this post, you will learn about the concepts of Gradient Boosting Regression with the help of Python Sklearn code example. Gradient Boosting algorithm is one of the key boosting machine learning algorithms apart from AdaBoost and XGBoost.  What is Gradient Boosting Regression? Gradient Boosting algorithm is used to generate an ensemble model by combining the weak learners or weak predictive models. Gradient boosting algorithm can be used to train models for both regression and classification problem. Gradient Boosting Regression algorithm is used to fit the model which predicts the continuous value. Gradient boosting builds an additive mode by using multiple decision trees of fixed size as weak learners or …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Differences between Random Forest vs AdaBoost

decision trees in random forest

In this post, you will learn about the key differences between AdaBoost classifier and Random Forest algorithm. As data scientists, you must get a good understanding of the differences between Random Forest and AdaBoost machine learning algorithm. Both algorithms can be used for both regression and classification problems. Both Random Forest and AdaBoost algorithm is based on creation of Forest of trees. They are called as ensemble learning algorithms. Random forest is created using a bunch of decision trees which make use of different variables or features and makes use of bagging techniques for data sample. In AdaBoost, the forest is created using a bunch of what is called as decision …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Classification Problems Real-life Examples

classification problems real life examples

In this post, you will learn about some popular and most common real-life examples of machine learning classification problems. For beginner data scientists, these examples will prove to be helpful to gain perspectives on real-world problems which can be termed as machine learning classification problems. This post will be updated from time-to-time to include interesting real-life examples which can be solved by training machine learning classification models. Before going ahead and looking into examples, let’s understand a little about what is machine learning (ML) classification problem. You may as well skip this section if you are familiar with the definition of machine learning classification problems & solutions.  What are ML …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Data Quality Challenges for Analytics Projects

data quality challenges for analytics projects

In this post, you will learn about some of the key data quality challenges which you may need to tackle with, if you are working on data analytics projects or planning to get started on data analytics initiatives. If you represent key stakeholders in analytics team, you may find this post to be useful in understanding the data quality challenges.  Here are the key challenges in relation to data quality which when taken care would result in great outcomes from analytics projects related to descriptive, predictive and prescriptive analytics: Data accuracy / validation Data consistency Data availability Data discovery Data usability Data SLA Cos-effective data Data Accuracy One of the most important …

Continue reading

Posted in Analytics, data engineering, Data Science. Tagged with , , .

Data Science vs Data Engineering Team – Have Both?

Data engineering vs Data Science

In this post, you will learn about different aspects of data science and data engineering team and also understand the key differences between them. As data science / engineering stakeholders, it is very important to understand whether we need to have one or both the teams to achieve high quality dataset & data pipelines as well as high-performant machine learning models. Background When an organization starts on the journey of building data analytics products, primarily based on predictive analytics, it goes on to set up a centralized (mostly) data science team consisting of data scientists. The data science team works with the product team or multiple product teams to gather the …

Continue reading

Posted in data engineering, Data Science. Tagged with , .

500+ Machine Learning Interview Questions

machine learning interview questions

This post consists of all the posts on this website in relation to interview questions / quizzes related to data science / machine learning topics.  These questions can prove to be helpful for the following: Product managers Data scientists Product Managers Interview Questions Find the questions for product managers on this page – Machine learning interview questions for product managers Data Scientists Interview Questions Here are posts representing 500+ interview questions which will be helpful for data scientists / machine learning engineers. You will find it useful as practise questions and answers while preparing for machine learning interview. Decision tree questions Machine learning validation techniques questions Neural networks questions – …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .