Category Archives: Data Science

Large Language Models (LLMs): Concepts, Use Cases & Examples

Large language models - LLM - building blocks

Last updated: 28th Nov, 2023 Large language models (LLMs) have been gaining traction in the world of natural language processing (NLP) due to their ability to process massive amounts of text and generate accurate results. These different LLM models are trained on large datasets, which contain hundreds of millions to billions of words. LLMs, as they are known, rely on complex algorithms including transformer architectures that shift through large datasets and recognize patterns at the word level. This data helps the model better understand natural language and how it is used in context and then make predictions related to text generation, text classification, etc. This blog post aims to provide …

Continue reading

Posted in Data Science, Deep Learning, Generative AI, Machine Learning, NLP.

Z-test vs T-test vs Chi-square test: Differences, Examples

z-test vs t-test vs chi-square test

In the world of data science, understanding the differences between various statistical tests is crucial for accurate data analysis. Three most popular tests – the Z-test, T-test, and Chi-square test – each serve specific purposes. This blog post will delve into their definitions, types, formulas, appropriate usage scenarios, and the Python/R packages that can be used for their implementation, along with real-world examples. Check out a detailed post on the differences between Z-test vs T-test. Definition: What’s Z-test vs T-test vs Chi-square test? The following represents the definition of each of the tests along with a real-world example: Z-test: The Z-test is a statistical test used to determine if there …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Z-test vs T-test: Differences, Formula, Examples

z-test vs t-test

Last updated: 27th Nov, 2023 When it comes to statistical tests, z-test and t-test are two of the most commonly used. But what is the difference between z-test and t-test? And when to use z-test vs t-test? In this relation, we also wonder about z-statistics vs t-statistics. And, the question arises around what’s the difference between z-statistics and t-statistics. In this blog post, we will answer all these questions and more! We will start by explaining the difference between z-test and t-test in terms of their formulas. Then we will go over some examples so that you can see how each test is used in practice. As data scientists, it …

Continue reading

Posted in Data Science, statistics. Tagged with , , .

Mean Squared Error or R-Squared – Which one to use?

Mean Squared Error Representation

Last updated: 27th Nov, 2023 As you embark on your journey to understand and evaluate the performance of regression models, it’s crucial to know when to use each of these metrics and what they reveal about your model’s accuracy. In this post, you will learn about the concepts of the mean-squared error (MSE) and R-squared (R2), the difference between them, and which one to use when evaluating the linear regression models. Note that MSE is very closely related to root mean squared error (RMSE) which is also discussed in this blog. You also learn Python examples to understand the concepts in a better manner. For learning the differences between other …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Gradient Descent in Machine Learning: Python Examples

Last updated: 26th Nov, 2023 In this post, you will learn about gradient descent algorithm and its importance in training machine learning models. For a data scientist, it is of utmost importance to get a good grasp on the concepts of  gradient descent algorithm as it is widely used for optimizing / minimizing the objective function / loss function related to various machine learning models such as  regression, neural network etc.  in terms of learning optimal weights / parameters. This algorithm is essential because it underpins many machine learning models, enabling them to learn from data by optimizing their performance. By understanding gradient descent, one gains insight into how algorithms …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Learning Curves Python Sklearn Example

Learning curve explained with python example

Last updated: 26th Nov, 2023 In this post, you will learn about how to use learning curves to assess the improvement in learning performance (accuracy, error rate, etc.) of a machine learning model while implementing using Python (Sklearn) packages. Knowing how to use learning curves will help you assess/diagnose whether the model is suffering from high bias (underfitting) or high variance (overfitting) and whether increasing training data samples could help solve the bias or variance problem. You may want to check some of the following posts in order to get a better understanding of bias-variance and underfitting-overfitting. Bias-variance concepts and interview questions Overfitting/Underfitting concepts and interview questions What are learning curves? …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Procurement Analytics Use Cases Examples

procurement analytics use cases

Last updated: 26th Nov, 2023 The procurement analytics applications is seeing tremendous growth in last few years. With so much data available, advancement in data analytics and related technology field, and the need for digital transformation across procurement organizations, it’s important to know how procurement analytics can help you make better business decisions. This blog will cover procurement analytics and key use cases examples from advanced analytics field such as machine learning, AI, generative AI that will be useful for business stakeholders such as category managers, sourcing managers, supplier relationship managers, business analysts/product managers, and data scientists to implement different use cases using machine learning. The use cases around data-driven decision …

Continue reading

Posted in Data Science, Generative AI, Machine Learning, Procurement. Tagged with , , , .

Data Science Explained: Framework, Methods, Examples

What is data science, concepts, examples

What is data science? This is a question that many people who are planning to start learning data science are asking, and for good reason. Data science is increasingly being applied to solve real-world issues, encompassing a broad range of areas. In this blog post, we’re going to explore data science: what it is, the methods it employs, and how it’s applied to solve various problems, with relevant examples. Stick with us, and by the end of this post, you’ll gain a comprehensive understanding of data science and its significance! What is Data Science? Before understanding what is data science, let’s understand what is science?  Science can be defined as …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , , .

Bagging Classifier Python Code Example

Bagging Classifier explained with Python code examples

Last updated: 25th Nov, 2023 Bagging is a type of an ensemble machine learning approach that combines the outputs from many learner to improve performance. The bagging algorithm works by dividing the training set into smaller subsets. These subsets are then processed through different machine-learning models. After processing, the predictions from each model are combined. This combination of predictions is used to generate an overall prediction for each instance in the original data. In this blog post, you will learn about the concept of Bagging along with Bagging Classifier Python code example.  Bagging can be used in machine learning for both classification and regression problem. The bagging classifier technique is utilized across a …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

PCA Explained Variance Concepts with Python Example

Last updated: 24th Nov, 2023 Dimensionality reduction is an important technique in data analysis and machine learning that allows us to reduce the number of variables in a dataset while retaining the most important information. By reducing the number of variables, we can simplify the problem, improve computational efficiency, and avoid overfitting. Principal Component Analysis (PCA) is a popular dimensionality reduction technique that aims to transform a high-dimensional dataset into a lower-dimensional space while retaining most of the information. PCA works by identifying the directions that capture the most variation in the data and projecting the data onto those directions, which are called principal components. However, when we apply PCA, …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Standard Deviation of Population vs Sample

Standard deviation for population and sample

Last updated: 24th Nov, 2023 Have you ever wondered what the difference between standard deviation of population and a sample? Or why and when it’s important to measure the standard deviation of both? In this blog post, we will explore what standard deviation is, the differences between the standard deviation of population and samples, and how to calculate their values using their formula and Python code example. By the end of this post, you should have a better understanding of standard deviation in general and why it’s important to calculate it for both populations and samples. Check out my related post – coefficient of variation vs standard deviation. What is …

Continue reading

Posted in Data Science, Python, statistics. Tagged with , , .

R-squared & Adjusted R-squared: Differences, Examples

r-squared vs adjusted r-squared

There are two measures of the strength of linear regression models: adjusted r-squared and r-squared. While they are both important, they measure different aspects of model fit. In this blog post, we will discuss the differences between adjusted r-squared and r-squared, as well as provide some examples to help illustrate their meanings. As a data scientist, it is of utmost importance to understand the differences between adjusted r-squared and r-squared in order to select the most appropriate linear regression model out of different regression models. What is R-squared? R-squared, also known as the coefficient of determination, is a measure of what proportion of the variance in the value of the …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Feature Scaling in Machine Learning: Python Examples

While training machine learning models, we come across the need for scaling features in order to have different features contribute to the predictions in an appropriate manner. Without scaling, features with larger numerical ranges can dominate those with smaller ranges, leading to biased or inefficient learning. In this post you will learn about this feature engineering technique namely feature scaling with Python code examples using which you could significantly improve performance of machine learning models. To demonstrate the technique, the models will be trained using Perceptron (single-layer neural network) classifier. What is Feature Scaling? Why is it needed? Feature scaling is a method used to standardize the range of independent variables …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

MSE vs RMSE vs MAE vs MAPE vs R-Squared: When to Use?

Regression models evaluation metrics MSE RMSE MAE MAPE R-Squared

As data scientists, we navigate a sea of metrics to evaluate the performance of our regression models. Understanding these metrics – Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and R-Squared – is crucial for robust model evaluation and selection. In this blog, we delve into the intricacies of these different metrics while learning them based on clear definitions, formulas, and guidance on when to use which of these metrics. Different Types of Regression Models Evaluation Metrics The following are different types of regression models evaluation metrics including MSE, RMSE, MAE, MAPE, R-squared and Adjusted R-squared which get used in …

Continue reading

Posted in Data Science, Machine Learning, statistics. Tagged with , , .

How to Add Rows & Columns to Pandas Dataframe

Add a new row and column to Pandas dataframe

Last updated: 27th Nov, 2023 Pandas is a popular data manipulation library in Python, widely used for data analysis and data science tasks. Pandas Dataframe is a two-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet or SQL table. One of the common tasks in data manipulation when working with Pandas package in Python is to add new columns and rows to a dataframe. It might seem like a trivial task, but choosing the right method to add a row to a dataframe as well as adding a column can significantly impact the performance and efficiency of your code. In this blog, we will explore …

Continue reading

Posted in Data Science, Python. Tagged with , .

Different Types of Statistical Tests: Concepts

different types of statistical tests

Last updated: 18th Nov, 2023 Statistical tests are an important part of data analysis. They help us understand the data and make inferences about the population. They are used to examine relationships between variables based on  hypothesis testing. They are a way of analyzing data to see if there is a significant difference between the two groups or a group and population. In statistics, there are two main types of tests: parametric and non-parametric. Both types of tests are used to make inferences about a population based on a sample. The difference between the two types of tests lies in the assumptions that they make about the data. Parametric tests …

Continue reading

Posted in Data Science, statistics. Tagged with , .