# Category Archives: Data Science

## Python – Creating Scatter Plot with IRIS Dataset

In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we have plotted and explored how how Petal Length and Sepal Length …

## Supervised & Unsupervised Learning Difference

Supervised and unsupervised learning are two different common types of machine learning tasks that are used to solve many different types of business problems. Supervised learning uses training data with labels to create supervised models, which can be used to predict outcomes for future datasets. Unsupervised learning is a type of machine learning task where the training data is not labeled or categorized in any way. For beginner data scientists, it is very important to get a good understanding of the difference between supervised and unsupervised learning. In this post, we will discuss how supervised and unsupervised algorithms work and what is difference between them. You may want to check …

## Logit vs Probit Models: Differences, Examples

Logit and probit models are statistical models that are used to model binary or dichotomous dependent variables. This means that the outcome of interest can only take on two possible values. In most cases, these models are used to predict whether or not something will happen. For example, a business might want to know if a particular advertising campaign will lead to an increase in sales. In this blog post, we will explain what logit and probit models are, and we will provide examples of how they can be used. As data scientists, it is important to understand the concepts of logit and probit models and when should they be …

## Categorical Data Visualization: Concepts, Examples

Everyone knows that data visualization is one of the most important tools for any data scientist or statistician. It helps us to better understand the relationships between variables and identify patterns in our data. There are specific types of visualization used to represent categorical data. This type of data visualization can be incredibly helpful when it comes to analyzing our data and making predictions about future trends. In this blog, we will dive into what categorical data visualization is, why it’s useful, and some examples of how it can be used. Types of Data Visualizations for Categorical Dataset When it comes to visualizing categorical data sets, there are primarily four …

## Types of Probability Distributions: Codes, Examples

In this post, you will learn the definition of 25 different types of probability distributions. Probability distributions play an important role in statistics and in many other fields, such as economics, engineering, and finance. They are used to model all sorts of real-world phenomena, from the weather to stock market prices. Before we get into understanding different types of probability distributions, let’s understand some fundamentals. If you are a data scientist, you would like to go through these distributions. This page could also be seen as a cheat sheet for probability distributions. What are Probability Distributions? Probability distributions are a way of describing how likely it is for a random …

## Cross Entropy Loss Explained with Python Examples

In this post, you will learn the concepts related to the cross-entropy loss function along with Python code examples and which machine learning algorithms use the cross-entropy loss function as an objective function for training the models. Cross-entropy loss is used as a loss function for models which predict the probability value as output (probability distribution as output). Logistic regression is one such algorithm whose output is a probability distribution. You may want to check out the details on how cross-entropy loss is related to information theory and entropy concepts – Information theory & machine learning: Concepts What’s Cross-Entropy Loss? Cross-entropy loss, also known as negative log likelihood loss, is …

## Accuracy, Precision, Recall & F1-Score – Python Examples

Classification models are used in classification problems to predict the target class of the data sample. The classification model predicts the probability that each instance belongs to one class or another. It is important to evaluate the performance of the classifications model in order to reliably use these models in production for solving real-world problems. Performance measures in machine learning classification models are used to assess how well machine learning classification models perform in a given context. These performance metrics include accuracy, precision, recall, and F1-score. Because it helps us understand the strengths and limitations of these models when making predictions in new situations, model performance is essential for machine learning. …

## Data Variables Types & Uses in Data Science

In data science, variables are the building blocks of any analysis. They allow us to group, compare, and contrast data points to uncover trends and draw conclusions. But not all variables are created equal; there are different types of variables that have specific uses in data science. In this blog post, we’ll explore the different variable types and their uses in data science. The picture below represents different types of variables one can find when working on statistics / data science projects: Lets understand each types of variables in the following sections. Categorical / Qualitative Variables Categorical variables are a type of data that can be grouped into categories, based …

## Population & Samples in Statistics: Examples

In statistics, population and sample are two fundamental concepts that help us to better understand data. A population is a complete set of objects from which we can obtain data. A population can include all people, animals, plants, or things in a given area. On the other hand, a sample is a subset of the population that is used for observation and analysis. In this blog, we will further explore the concepts of population and samples and provide examples to illustrate the differences between them in statistics. What is a population in statistics? In statistics, population refers to the entire set of objects or individuals about which we want to …

## Procurement Advanced Analytics Use Cases

The procurement analytics applications are poised to grow exponentially in the next few years. With so much data available and the need for digital transformation across procurement organization, it’s important to know how procurement analytics can help you make better business decisions. This blog will cover procurement analytics and key use cases of advanced analytics that will be useful for business stakeholders such as category managers, sourcing managers, supplier relationship managers, business analysts / product managers, and data scientists implement different use cases using machine learning. Procurement analytics will allow you to use data very effectively in achieving data-driven decision making. One can get started with procurement analytics with dashboards …

## Most Common Data Pitfalls to Avoid

Working with data can be a powerful tool, but there are some common pitfalls that a data professionals including data analysts & data scientists should always be aware of when gathering, storing, and analyzing data. Good data is essential for any successful analytics project, and understanding the most common data pitfalls will help you avoid them. In this blog, we will take a look at what these mistakes are and how to avoid them. The picture below represents the most common data pitfalls to avoid. Considering Data as the Truth One major data pitfall is when people consider data as absolute truth (reflection of reality) without taking any other factors …

## Data Science Interview Questions – List

Are you preparing for a data science interview and looking for some common questions that may be asked? Look no further! In this blog post, we will provide a list of potential interview questions for a data science position. These questions cover a range of topics, from technical skills and experience to problem-solving and communication. Whether you are a seasoned data scientist or just starting out in the field, these questions will help you get ready for your upcoming interview and showcase your knowledge and expertise. So let’s dive in and see what’s in store! Here are some of the most popular / potential interview questions that may be asked …

## Instance-based vs Model-based Learning: Differences

Machine learning is a field of artificial intelligence that deals with giving machines the ability to learn without being explicitly programmed. In this context, instance-based learning and model-based learning are two different approaches used to create machine learning models. While both approaches can be effective, they also have distinct differences that must be taken into account when building a machine learning system. Let’s explore the differences between these two types of machine learning. What is instance-based learning & how does it work? Instance-based learning (also known as memory-based learning or lazy learning) involves memorizing training data in order to make predictions about future data points. This approach doesn’t require any …

## Data Storytelling Explained with Examples

Have you ever told a story to someone, but they just didn’t seem to understand it? They might have been confused about the plot or why the characters acted in certain ways. If this has happened to you before, then you are not alone. Many people struggle with data storytelling because they do not know how to communicate their data effectively. Data storytelling is a powerful tool that can be used to educate, inform or persuade an audience. By using charts, graphs, images and other visuals, data can be made more interesting and engaging. Data storytelling involves taking data and presenting it in a way that is easy to understand and …

## Data Analyst, Data Scientist or Data Engineer: What to Become?

There is a lot of confusion surrounding the job designations or titles such as “data analyst,” “data scientist,” and “data engineer“. What do these job titles mean, and what are the differences between them? Before selecting one of these career path, it will be good to get a good understanding about these job titles or designations, related roles & responsibilities and career potential. In this blog post, we will describe each title / designation and discuss the key distinctions between them. By the end of this post, you will have a better understanding of which career path and related designations are right for you! Shall I become a data analyst? …

## Data Warehouse vs. Data Lake – Differences, Examples

When it comes to data storage, there are two distinct types of solutions that you can use—a data warehouse and a data lake. Both of these solutions have their own benefits, but it’s important to understand the key differences between them so that you can choose the best option for your needs. Let’s take a closer look at what makes each solution unique. What is a Data Warehouse? A data warehouse is defined as an electronic storage system used for reporting and analysis. Data warehouses store data in a structured (row-column) format. It typically contains aggregated collections of data from multiple sources, which come together in one database. A data warehouse …