# Category Archives: Data Science

## Machine Learning for predicting Ice Shelves Vulnerability

In this post, you will learn about usage of machine learning for predicting ice shelves vulnerability. Before getting into the details, lets understand what is ice shelves vulnerability and how it is impacting global warming / climate change. What are ice shelves? Ice shelves are permanent floating sheets of ice that connect to a landmass. Most of the world’s ice shelves hug the coast of Antarctica. Ice from enormous ice sheets slowly oozes into the sea through glaciers and ice streams. If the ocean is cold enough, that newly arrived ice doesn’t melt right away. Instead it may float on the surface and grow larger as glacial ice behind it continues to flow into the …

## Python – Text Classification using Bag-of-words Model

In this post, you will learn about the concepts of bag-of-words (BoW) model and how to train a text classification model using Python Sklearn. Some of the most common text classification problems includes sentiment analysis, spam filtering etc. In these problems, one can apply bag-of-words technique to train machine learning models for text classification. It will be good to understand the concepts of bag-or-words model while beginning on learning advanced NLP techniques for text classification in machine learning. The following topics will be covered in this post: What is a bag-of-words model? How to fit a bag-of-words model using Python Sklearn? How to fit a text classification model using bag-of-words technique? …

## Python Scraper for GoogleNews, Twitter, Reddit & Arxiv

In this post, you will get the Python code for scraping latest and greatest news about any topics from Google News, Twitter, Reddit and Arxiv. This could prove to be very useful for data scientist, machine learning enthusiats to keep track of latest and greatest happening in the field of artificial intelligence. If you are doing some research work, these pieces of code would prove to be very handy to quickly access the information. The code in this post has been worked out in Google Colab notebook. First and foremost, import the necessary Python libraries such as the following for GoogleNews, Twitter and Arxiv. Python Code for mining GoogleNews Here …

## Spend Analytics – 5 Ws of Spend Analysis

In this post, you will learn about 5 Ws of spend analytics. In case you are a procurement professional looking to understand use cases related to spend analytics, you may find this post to be very useful. In simple words, spend analytics is about extracting insights from spend in different procurement categories. What are we spending on? First and foremost, it is important to get visibility on what items are we spending on. This can be achieved using a dashboard. This form of analytics is also called descriptive analytics. Analyzing item spends can be termed as Item spend analytics. The items can be related to direct or indirect procurement. Indirect …

## Python – How to Create Dictionary using Pandas Series

In this post, you will learn about one of the important Pandas fundamental data structure namely Series and how it can be used as a dictionary. It will be useful for beginner data scientist to understand the concept of Pandas Series object. A dictionary is a structure that maps arbitrary keys to a set of arbitrary values. Pandas Series is a one-dimensional array of indexed data. It can be created using a list or an array. Pandas Series can be thought of as a special case of Python dictionary. It is a structure which maps typed keys to a set of typed values. Here are the three different ways in …

## Free Online Books – Machine Learning with Python

This post lists down free online books for machine learning with Python. These books covers topiccs related to machine learning, deep learning, and NLP. This post will be updated from time to time as I discover more books. Here are the titles of these books: Python data science handbook Building machine learning systems with Python Deep learning with Python Natural language processing with Python Think Bayes Scikit-learn tutorial – statistical learning for scientific data processing Python Data Science Handbook Covers topics such as some of the following: Introduction to Numpy Data manipulation with Pandas Visualization with Matplotlib Machine learning topics (Linear regression, SVM, random forest, principal component analysis, K-means clustering, Gaussian …

## Great Site for Matrix Multiplication Demo

Here is a great website for the matrix multiplication demo. If you are a beginner data scientist, you will love this. http://matrixmultiplication.xyz/ Here is how the website looks like. It has just one page. It actually shows how multiplication happens given the different dimensions of the matrix. Here are few other websites for understanding matrix multiplication concepts: https://www.mathsisfun.com/algebra/matrix-multiplying.html Khan Academy – Matrix multiplication

## Different types of Machine Learning Problems

This post describes the most popular types of machine learning problems using multiple different images/pictures. The following represent various different types of machine learning problems: Supervised learning Unsupervised learning Reinforcement learning Transfer learning Imitation learning Meta-learning In this post, the image shows supervised, unsupervised, and reinforcement learning. You may want to check the explanation on this Youtube lecture video. Unsupervised Learning Problems In unsupervised learning problems, the learning algorithm learns about the structure of data from the given data set and generates fakes or insights. In the above diagram, you may see that what is given is the unlabeled dataset X. The unsupervised learning algorithm learns the structure of data …

## Top 10+ Youtube AI / Machine Learning Courses

In this post, you get access to top Youtube free AI/machine learning courses. The courses are suitable for data scientists at all levels and cover the following areas of machine learning: Machine learning Deep learning Natural language processing (NLP) Reinforcement learning Here are the details of the free machine learning / deep learning Youtube courses. S.No Title Description Type 1 CS229: Machine Learning (Stanford) Machine learning lectures by Andrew NG; In case you are a beginner, these lectures are highly recommended Machine learning 2 Applied machine learning (Cornell Tech CS 5787) Covers all of the most important ML algorithms and how to apply them in practice. Includes 3 full lectures …

## Scikit-learn vs Tensorflow – When to use What?

In this post, you will learn about when to use Scikit-learn vs Tensorflow. For data scientists/machine learning enthusiasts, it is very important to understand the difference such that they could use these libraries appropriately while working on different business use cases. When to use Scikit-learn? Scikit-learn is a great entry point for beginners data scientists. It provides an efficient implementation of many machine learning algorithms. In addition, it is very simple and easy to use. You can get started with Scikit-learn in a very easy manner by using Jupyter notebook. Scikit-learn can be used to solve different kinds of machine learning problems including some of the following: Classification (SVM, nearest neighbors, random …

## Machine Learning – Training, Validation & Test Data Set

In this post, you will learn about the concepts of training, validation, and test data sets used for training machine learning models. The post is most suitable for data science beginners or those who would like to get clarity and a good understanding of training, validation, and test data sets concepts. The following topics will be covered: Data split – training, validation, and test data set Different model performance based on different data splits Data Splits – Training, Validation & Test Data Sets You can split data into the following different sets and each data split configuration will have machine learning models having different performance: Training data set: When you …

## Why use Random Seed in Machine Learning?

In this post, you will learn about why and when do we use random seed values while training machine learning models. This is a question most likely asked by beginners data scientist/machine learning enthusiasts. We use random seed value while creating training and test data set. The goal is to make sure we get the same training and validation data set while we use different hyperparameters or machine learning algorithms in order to assess the performance of different models. This is where the random seed value comes into the picture. Different Python libraries such as scikit-learn etc have different ways of assigning random seeds. While training machine learning models using Scikit-learn, …

## Precision & Recall Explained using Covid-19 Example

In this post, you will learn about the concepts of precision, recall, and accuracy when dealing with the machine learning classification model. Given that this is Covid-19 age, the idea is to explain these concepts in terms of a machine learning classification model predicting whether the patient is Corona positive or not based on the symptoms and other details. The following model performance concepts will be described with the help of examples. What is the model precision? What is the model recall? What is the model accuracy? What is the model confusion matrix? Which metrics to use – Precision or Recall? Before getting into learning the concepts, let’s look at the data (hypothetical) derived out …

## Actionable Insights Examples – Turning Data into Action

In this post, you will learn about how to turn data into information and then to actionable insights with the help of few examples. It will be helpful for data analysts, data scientists, and business analysts to get a good understanding of what is actionable insight? You will understand aspects related to data-driven decision making. Before getting into the details, let’s understand what is the problem at hand? The school authority is trying to assess and improve the health of students. Here is the question it is dealing with: How could we improve the overall health of the students in the school? We will look into the approach of finding the …

## When to use Deep Learning vs Machine Learning Models?

In this post, you will learn about when to go for training deep learning models from the perspective of model performance and volume of data. As a machine learning engineer or data scientist, it always bothers as to can we use deep learning models in place of traditional machine learning models trained using algorithms such as logistic regression, SVM, tree-based algorithms, etc. The objective of this post is to provide you with perspectives on when to go for traditional machine learning models vs deep learning models. The two key criteria based on which one can decide whether to go for deep learning vs traditional machine learning models are the following: …

## Most Common Types of Machine Learning Problems

In this post, you will learn about the most common types of machine learning (ML) problems along with a few examples. Without further ado, let’s look at these problem types and understand the details. Regression Classification Clustering Time-series forecasting Anomaly detection Ranking Recommendation Data generation Optimization Problem types Details Algorithms Regression When the need is to predict numerical values, such kinds of problems are called regression problems. For example, house price prediction Linear regression, K-NN, random forest, neural networks Classification When there is a need to classify the data in different classes, it is called a classification problem. If there are two classes, it is called a binary classification problem. …

Nice question to help us