Tag Archives: Data Science

Building Data Analytics Organization: Operating Models

Data analytics organization

Most businesses these days are collecting and analyzing data to help them make better decisions. However, in order to do this effectively, they need to build a data analytics organization. This involves hiring the right people with the right skills, setting up the right infrastructure and creating the right processes. In this article, we’ll take a closer look at what it takes to set up a successful data analytics organization. We’ll start by discussing the importance of having the right team in place. Then we’ll look at some of the key infrastructure components that need to be put in place. Finally, we’ll discuss some of the key process considerations that …

Continue reading

Posted in Big Data, Data, Data analytics, data engineering, Data lake, Data Science. Tagged with , , .

Who is a Data Scientist? Test your Knowledge

Interview questions

Do you know what a data scientist is? You may think you do, but take this quiz to find out for sure! Data scientists are essential to modern business and it’s important to know who they are and what they do. This quiz is just for fun, but it’s also a great opportunity to learn more about one of the most in-demand professions today. So put your data scientist knowledge to the test and see how well you really know this profession! And, feel free to share your thoughts if you disagree with the answer of any of the questions. Here are a few related posts on this topic: What …

Continue reading

Posted in Career Planning, Data, Data analytics, Data Science, Interview questions, Machine Learning. Tagged with , .

PCA Explained Variance Concepts with Python Example

In this post, you will learn about the concepts of explained variance which is one of the key concepts related to principal component analysis (PCA). The explained variance concepts will be illustrated with Python code examples. Check out the concepts of Eigenvalues and Eigenvectors in this post – Why & when to use Eigenvalue and Eigenvectors. What is Explained Variance? Explained variance is a statistical measure of how much variation in a dataset can be attributed to each of the principal components (eigenvectors) generated by the principal component analysis (PCA) method. In very basic terms, it refers to the amount of variability in a data set that can be attributed to …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

One-hot Encoding Concepts & Python Examples

One-hot encoding concepts and python examples

In this post, you will learn about One-hot Encoding concepts and code examples using Python programming language. One-hot encoding is also called as dummy encoding. In this post, OneHotEncoder class of sklearn.preprocessing will be used in the code examples. As a data scientist or machine learning engineer, you must learn the one-hot encoding techniques as it comes very handy while training machine learning models. What is One-Hot Encoding? One-hot encoding is a process whereby categorical variables are converted into a form that can be provided as an input to machine learning models. It is an essential preprocessing step for many machine learning tasks. The goal of one-hot encoding is to …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Interns – Machine Learning Interview Questions & Answers: Set 1

interns machine learning interview questions and answers

This page lists down first set of machine learning / data science interview questions and answers for interns / freshers / beginners. If you are an intern or a fresher or a beginner in machine learning field, and, you are looking for some practice tests before appearing for your upcoming machine learning interview, these practice tests would prove to be very useful and handy. Machine Learning topics covered in Test In this set, some of the following topics have been covered: Machine learning fundamentals (Supervised and unsupervised learning algorithms) Different types of machine learning problems and related algorithms with examples Concepts related with regression, classification and clustering Practice Test (Questions …

Continue reading

Posted in Career Planning, Data Science, Freshers, Interview questions, Machine Learning. Tagged with , , , .

Data-centric vs Model-centric AI: Concepts, Examples

Data centric vs model-centric AI

There is a lot of discussion around AI and which approach is better: model-centric or data-centric. In this blog post, we will explore both approaches and give examples of each. We will also discuss the benefits and drawbacks of each approach. By the end of this post, you will have a better understanding of both AI approaches and be able to decide which one is right for your business! As product managers and data science architects, you should be knowledgeable about both of these AI approaches so that you can make informed decisions about the products and services you build. Model-centric approach to AI Model-centric approach to AI is about …

Continue reading

Posted in AI, Data, Data analytics, Data Science, Machine Learning. Tagged with , , .

Data Science Architect Interview Questions

interview questions

In this post, you will learn about interview questions that can be asked if you are going for a data scientist architect job. Data science architect needs to have knowledge in both data science/machine learning and cloud architecture. In addition, it also helps if the person is hands-on with programming languages such as Python & R. Without further ado, let’s get into some of the common questions right away. I will add further questions in the time to come. Q1. How do you go about architecting a data science or machine learning solution for any business problem? Solving a business problem using data science or machine learning based solution can …

Continue reading

Posted in Career Planning, Data Science, Enterprise Architecture, Interview questions, Machine Learning. Tagged with , , , .

Decision Science & Data Science – Differences, Examples

Decision science vs data science

Decision science and Data Science are two data-driven fields that have grown in prominence over the past few years. Data scientists use data to arrive at the truth by coming up with conclusions or predictions about things like customer behavior and assess suitability of those conclusions / predictions, while decision scientists combine data with other information sources to make decisions and assess suitability of those decisions for enterprise-wide adoption. The difference between data science and decision science is important for business owners to understand in clear manner in order to leverage the best of both worlds to achieve desired business outcomes. In this post, you will learn about the concepts …

Continue reading

Posted in AI, Analytics, Data Science, Decision Science. Tagged with , .

Feature Importance & Random Forest – Python

Random forest for feature importance

In this post, you will learn about how to use Random Forest Classifier (RandomForestClassifier) for determining feature importance using Sklearn Python code example. This will be useful in feature selection by finding most important features when solving classification machine learning problem. It is very important to understand feature importance and feature selection techniques for data scientists to use most appropriate features for training machine learning models. Recall that other feature selection techniques includes L-norm regularization techniques, greedy search algorithms techniques such as sequential backward / sequential forward selection etc.  What & Why of Feature Importance? Feature importance is a key concept in machine learning that refers to the relative importance of each feature …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Sklearn SimpleImputer Example – Impute Missing Data

In this post, you will learn about how to use Python’s Sklearn SimpleImputer for imputing / replacing numerical & categorical missing data using different strategies. In one of the related article posted sometime back, the usage of fillna method of Pandas DataFrame is discussed. Handling missing values is key part of data preprocessing and hence, it is of utmost importance for data scientists / machine learning Engineers to learn different techniques in relation imputing / replacing numerical or categorical missing values with appropriate value based on appropriate strategies. SimpleImputer Python Code Example SimpleImputer is a class in the sklearn.impute module that can be used to replace missing values in a dataset, using a …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Pandas dropna: Drop Rows & Columns with Missing Values

pandas dropna method code sample

In this blog post, we will be discussing Pandas’ dropna method. This method is used for dropping rows and columns that have missing values. Pandas is a powerful data analysis library for Python, and the dropna function is one of its most useful features. As data scientists, it is important to be able to handle missing data, and Pandas’ dropna function makes this easy. Pandas dropna Method Pandas’ dropna function allows us to drop rows or columns with missing values in our dataframe. Find the documentation of Pandas dropna method on this page: pandas.DataFrame.dropna. The dropna method looks like the following: DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False) Given the above method and parameters, the following …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Accuracy, Precision, Recall & F1-Score – Python Examples

Classification models are used in classification problems to predict the target class of the data sample. The classification model predicts the probability that each instance belongs to one class or another. It is important to evaluate the performance of the classifications model in order to reliably use these models in production for solving real-world problems. Performance measures in machine learning classification models are used to assess how well machine learning classification models perform in a given context. These performance metrics include accuracy, precision, recall, and F1-score. Because it helps us understand the strengths and limitations of these models when making predictions in new situations, model performance is essential for machine …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Support Vector Machine (SVM) Python Example

Support vector machine maximize the margin 2

In this post, you will learn about the concepts of Support Vector Machine (SVM)  with the help of  Python code example for building a machine learning classification model. We will work with Python Sklearn package for building the model. As data scientists, it is important to get a good grasp on SVM algorithm and related aspects. What is Support Vector Machine (SVM)? Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression tasks. At times, SVM for classification is termed as support vector classification (SVC) and SVM for regression is termed as support vector regression (SVR). In this post, we will learn about SVM …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

Overfitting & Underfitting in Machine Learning

Overfitting and underfitting represented using Model error vs complexity plot

The performance of the machine learning models depends upon two key concepts called underfitting and overfitting. In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test your understanding by attempting the quiz. The quiz will help you prepare well for interview questions in relation to underfitting & overfitting. As data scientists, you must get a good understanding of the overfitting and underfitting concepts.  Introduction to Overfitting & Underfitting Assuming an independent and identically distributed (I.I.d) dataset, when the prediction error on both the training and validation dataset is …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Spend Analytics Use Cases: AI & Data Science

What is spend analytics

In this post, you will learn about the high-level concepts of spend analytics in relation to procurement and how data science / machine learning & AI can be used to extract actionable insights as part of spend analytics. This will be useful for procurement professionals such as category managers, sourcing managers, and procurement analytics stakeholders looking to understand the concepts of spend analytics and how they can drive decisions based on spend analytics. What is Spend Analytics? Simply speaking, spend analytics is about performing systematic computational analysis to extract actionable insights from spend and savings data across different categories of spends in order to achieve desired business outcomes such as cost savings, …

Continue reading

Posted in Data Science, Machine Learning, Procurement. Tagged with , .

Logistic Regression Explained with Python Example

logistic regression model 3

In this blog post, we will discuss the logistic regression machine learning algorithm with a python example. Logistic regression is a type of regression algorithm that is used to predict the probability of occurrence of an event. It is often used in machine learning applications. In this tutorial, we will use python to implement logistic regression for binary classification problems.  What is Logistic Regression? Logistic regression is a machine learning algorithm used for classification problems. That is, it can be used to predict whether an instance belongs to one class or the other. For example, it could be used to predict whether a person is male or female, based on …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .