Tag Archives: Data Science

Hold-out Method for Training Machine Learning Models

Hold-out-method-Training-Validation-Test-Dataset

The hold-out method for training the machine learning models is a technique that involves splitting the data into different sets: one set for training, and other sets for validation and testing. The hold-out method is used to check how well a machine learning model will perform on the new data.  In this post, you will learn about the hold-out method used during the process of training the machine learning model. Do check out my post on what is machine learning? concepts & examples for a detailed understanding of different aspects related to the basics of machine learning. Also, check out a related post on what is data science? When evaluating …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Different types of Time-series Forecasting Models

different types of time-series forecasting

Forecasting is the process of predicting future events based on past and present data. Time-series forecasting is a type of forecasting that predicts future events based on time-stamped data points. There are many different types of time-series forecasting models, each with its own strengths and weaknesses. In this blog post, we will discuss the most common time-series forecasting machine learning models such as the following, and provide examples of how they can be used to predict future events. Autoregressive (AR) model Moving average (MA) model Autoregressive moving average (ARMA) model Autoregressive integrated moving average (ARIMA) model Seasonal autoregressive integrated moving average (SARIMA) model Vector autoregressive (VAR) model Vector error correction …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Autoregressive (AR) models with Python examples

Autoregressive (AR) models are a subset of time series models, which can be used to predict future values based on previous observations. AR models use regression techniques and rely on autocorrelation in order to make accurate predictions. This blog post will provide Python code examples that demonstrate how you can implement an AR model for your own predictive analytics project. You will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be a useful read. Note that time-series forecasting is one of the important areas of data science/machine learning.  For beginners, time-series forecasting is the process of using a model …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Ridge Regression Concepts & Python example

Ridge regression is a type of linear regression that penalizes ridge coefficients. This technique can be used to reduce the effects of multicollinearity in ridge regression, which may result from high correlations among predictors or between predictors and independent variables. In this tutorial, we will explain ridge regression with a Python example. What is Ridge Regression? Ridge regression is a type of linear regression technique that is used in machine learning to reduce the overfitting of linear models. Recall that Linear regression is a method of modeling data that represents relationships between a response variable and one or more predictor variables. Ridge regression is used when there are multiple variables that …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Lasso Regression Explained with Python Example

In this post, you will learn concepts of Lasso regression along with Python Sklearn examples. Lasso regression algorithm introduces penalty against model complexity (a large number of parameters) using regularization parameter. The other two similar forms of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. In this post, the following topics are discussed: What’s Lasso Regression? Lasso regression is a machine learning algorithm that can be used to perform linear regression while also reducing the number of features used in the model. Lasso stands for least absolute shrinkage and selection operator. Pay attention to the words, “least absolute shrinkage” and “selection”. We will …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Difference between Data Science & Decision Science

Decision science vs data science

Data science and decision science are two data-driven fields that have grown in prominence over the past few years. Data scientists use data to come up with conclusions or predictions about things like customer behavior, while decision scientists combine data with other information sources to make decisions. The difference between data science and decision science is important for business owners who want to make informed decisions. In this post, you will learn about the difference between data science and decision science. Those venturing out to learn data science must understand whether they want to learn data science or decision science or both. The following are some of the key questions …

Continue reading

Posted in AI, Analytics, Data Science, Decision Science. Tagged with , .

Bias-Variance Trade-off Concepts & Interview Questions

Bias variance concepts and interview questions

Bias and variance are two important properties of machine learning models. In this post, you will learn about the concepts of bias & variance in relation to the machine learning (ML) models. Bias refers to how well your model can represent all possible outcomes, whereas variance refers to how sensitive your predictions are to changes in the model’s parameters. The tradeoff between bias and variance is a fundamental problem in machine learning, and it is often necessary to experiment with different model types in order to find the balance that works best for a given dataset. In addition to learning the concepts related to Bias vs variance trade-off, you would …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

What is Data Science? Concepts & Examples

What is data science, concepts, examples

What is data science? This is a question that many people are asking, and for good reason. Data science is a relatively new field, and it covers a lot of ground. In this blog post, we will discuss what data science is, and we will give some examples of how it can be used to solve problems. Stay tuned, because by the end of this post you will have a clear understanding of what data science is and why it matters! What is Data Science? Before understanding what is data science, let’s understand what is science? Science can be defined as a systematic and logical approach to discovering how things …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , , .

Machine Learning – Sensitivity vs Specificity Difference

sensitivity vs specificity vs ROC vs AUC

In this post, we will try and understand the concepts behind machine learning model evaluation metrics such as sensitivity and specificity which is used to determine the performance of the machine learning models. The post also describes the differences between sensitivity and specificity. The concepts have been explained using the model for predicting whether a person is suffering from a disease or not. You may want to check out another related post titled ROC Curve & AUC Explained with Python examples. What is Sensitivity Sensitivity is a measure of how well a machine learning model can detect positive instances. It is also known as the true positive rate (TPR) or recall. Sensitivity is …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Stochastic Gradient Descent Python Example

stochastic gradient descent python example

In this post, you will learn the concepts of Stochastic Gradient Descent (SGD) using a Python example. Stochastic gradient descent is an optimization algorithm that is used to optimize the cost function while training machine learning models. The most popular algorithm such as gradient descent takes a long time to converge for large datasets. This is where the variant of gradient descent such as stochastic gradient descent comes into the picture. In order to demonstrate Stochastic gradient descent concepts, the Perceptron machine learning algorithm is used. Recall that Perceptron is also called a single-layer neural network. Before getting into details, let’s quickly understand the concepts of Perceptron and the underlying learning …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Dummy Variables in Regression Models: Python, R

dummy variable regression models python r examples

In linear regression, dummy variables are used to represent the categorical variables in the model. There are a few different ways that dummy variables can be created, and we will explore a few of them in this blog post. We will also take a look at some examples to help illustrate how dummy variables work. We will also understand concepts related to the dummy variable trap. By the end of this post, you should have a better understanding of how to use dummy variables in linear regression models. As a data scientist, it is important to understand how to use linear regression and dummy variables. What are dummy variables in …

Continue reading

Posted in Data Science, Machine Learning, R. Tagged with , , .

Linear vs Non-linear Data: How to Know

Non-linear data set

In this post, you will learn the techniques in relation to knowing whether the given data set is linear or non-linear. Based on the type of machine learning problems (such as classification or regression) you are trying to solve, you could apply different techniques to determine whether the given data set is linear or non-linear. For a data scientist, it is very important to know whether the data is linear or not as it helps to choose appropriate algorithms to train a high-performance model. You will learn techniques such as the following for determining whether the data is linear or non-linear: Use scatter plot when dealing with classification problems Use …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , .

How to deal with Class Imbalance in Python

In this post, you will learn about how to deal with class imbalance by adjusting class weight while solving a machine learning classification problem. This will be illustrated using Sklearn Python code example. What is Class Imbalance? Class imbalance refers to a problem in machine learning where the classes in the data are not equally represented. For example, if there are 100 data points and 90 of them belong to Class A and 10 belong to Class B, then the classes are imbalanced. Class imbalance can lead to problems with training machine learning models because the models may be biased towards the more common class. If there are more examples …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Linear regression hypothesis testing: Concepts, Examples

Simple linear regression model

In relation to machine learning, linear regression is defined as a predictive modeling technique that allows us to build a model which can help predict continuous response variables as a function of a linear combination of explanatory or predictor variables. While training linear regression models, we need to rely on hypothesis testing in relation to determining the relationship between the response and predictor variables. In the case of the linear regression model, two types of hypothesis testing are done. They are T-tests and F-tests. In other words, there are two types of statistics that are used to assess whether linear regression models exist representing response and predictor variables. They are …

Continue reading

Posted in Data Science, Machine Learning, statistics. Tagged with , , .

Differences between Random Forest vs AdaBoost

decision trees in random forest

In this post, you will learn about the key differences between the AdaBoost classifier and the Random Forest algorithm. As data scientists, you must get a good understanding of the differences between Random Forest and AdaBoost machine learning algorithms. Both algorithms can be used for both regression and classification problems. Random forest and Adaboost are two popular machine learning algorithms. Both algorithms can be used for classification and regression tasks. Both Random Forest and AdaBoost algorithm is based on the creation of a Forest of trees. Random Forest is an ensemble learning algorithm that is created using a bunch of decision trees that make use of different variables or features …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

K-Nearest Neighbors Explained with Python Examples

In this post, you will learn about the K-nearest neighbors algorithm with Python Sklearn examples. K-nearest neighbors algorithm is used for solving both classification and regression machine learning problems.  Introduction to K-Nearest Neighbors (K-NN) K-nearest neighbors is a supervised machine learning algorithm for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-nearest neighbors are used for classification or regression. The main idea behind K-NN is to find the K nearest data points, or neighbors, to a given data point and then predict the label or value of the given data point based on the labels or values …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .