# Category Archives: Data Science

## Top 10 Analytics Strategies for Great Data Products

In this post, you will learn about the top 10 data analytics strategies which will help you create successful data products. These strategies will be helpful in case you are setting up a data analytics practice or center of excellence (COE). As an AI / Machine Learning / Data Science stakeholders, it will be important to understand these strategies in order to deliver analytics solution which creates business value having positive business impact. Here are the top 10 data analytics strategies: Identify top 2-3 business problems Identify related business / engineering organizations Create measurement plan by identifying right KPIs Identify analytics deliverables such as analytics reports, predictions etc Gather data …

## Keras CNN Image Classification Example

In this post, you will learn about how to train a Keras Convolution Neural Network (CNN) for image classification. Before going ahead and looking at the Python / Keras code examples and related concepts, you may want to check my post on Convolution Neural Network – Simply Explained in order to get a good understanding of CNN concepts. Keras CNN Image Classification Code Example First and foremost, we will need to get the image data for training the model. In this post, Keras CNN used for image classification uses the Kaggle Fashion MNIST dataset. Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a …

## Data Quality Challenges for Machine Learning Models

In this post, you will learn about some of the key data quality challenges which need to be dealt with in a consistent and sustained manner to ensure high quality machine learning models. Note that high quality models can be termed as models which generalizes better (lower true error with predictions) with unseen data or data derived from larger population. As a data science architect or quality assurance (QA) professional dealing with quality of machine learning models, you must learn some of these challenges and plan appropriate development processes to deal with these challenges. Here are some of the key data quality challenges which need to be tackled appropriately in …

## Data Quality Assessment Frameworks – Machine Learning

In this post, you will learn about data quality assessment frameworks / techniques in relation to machine learning and why one needs to assess data quality for building high-performance machine learning models? As a data science architect or development manager, you must get a sense of the importance of data quality in relation to building high-performance machine learning models. The idea is to understand what is the value of data set. The goal is to determine whether the value of data can be quantised. This is because it is important to understand whether the data contains rich information which could be valuable for building models and inform stakeholders on data …

## Keras Neural Network for Regression Problem

In this post, you will learn about how to train neural network for regression machine learning problems using Python Keras. Regression problems are those which are related to predicting numerical continuous value based on input parameters / features. You may want to check out some of the following posts in relation to how to use Keras to train neural network for classification problems: Keras – How to train neural network to solve multi-class classification Keras – How to use learning curve to select most optimal neural network configuration for training classification model In this post, the following topics are covered: Design Keras neural network architecture for regression Keras neural network …

## Keras – Categorical Cross Entropy Loss Function

In this post, you will learn about when to use categorical cross entropy loss function when training neural network using Python Keras. Generally speaking, the loss function is used to compute the quantity that the the model should seek to minimize during training. For regression models, the commonly used loss function used is mean squared error function while for classification models predicting the probability, the loss function most commonly used is cross entropy. In this post, you will learn about different types of cross entropy loss function which is used to train the Keras neural network model. Cross entropy loss function is an optimization function which is used in case …

## Python Keras – Learning Curve for Classification Model

In this post, you will learn about how to train an optimal neural network using Learning Curves and Python Keras. As a data scientist, it is good to understand the concepts of learning curve vis-a-vis neural network classification model to select the most optimal configuration of neural network for training high-performance neural network. In this post, the following topics have been covered: Concepts related to training a classification model using a neural network Python Keras code for creating the most optimal neural network using a learning curve Training a Classification Neural Network Model using Keras Here are some of the key aspects of training a neural network classification model using Keras: …

## Keras Multi-class Classification using IRIS Dataset

In this post, you will learn about how to train a neural network for multi-class classification using Python Keras libraries and Sklearn IRIS dataset. As a deep learning enthusiasts, it will be good to learn about how to use Keras for training a multi-class classification neural network. The following topics are covered in this post: Keras neural network concepts for training multi-class classification model Python Keras code for fitting neural network using IRIS dataset Keras Neural Network Concepts for training Multi-class Classification Model Training a neural network for multi-class classification using Keras will require the following seven steps to be taken: Loading Sklearn IRIS dataset Prepare the dataset for training and testing …

## Python Sklearn – How to Generate Random Datasets

In this post, you will learn about some useful random datasets generators provided by Python Sklearn. There are many methods provided as part of Sklearn.datasets package. In this post, we will take the most common ones such as some of the following which could be used for creating data sets for doing proof-of-concepts solution for regression, classification and clustering machine learning algorithms. As data scientists, you must get familiar with these methods in order to quickly create the datasets for training models using different machine learning algorithms. Methods for generating datasets for Classification Methods for generating datasets for Regression Methods for Generating Datasets for Classification The following is the list of …

## Neural Networks and Mathematical Models Examples

In this post, you will learn about concepts of neural networks with the help of mathematical models examples. In simple words, you will learn about how to represent the neural networks using mathematical equations. As a data scientist / machine learning researcher, it would be good to get a sense of how the neural networks can be converted into a bunch of mathematical equations for calculating different values. Having a good understanding of representing the activation function output of different computation units / nodes / neuron in different layers would help in understanding back propagation algorithm in a better and easier manner. This will be dealt in one of the …

## Adaptive Linear Neuron (Adaline) Python Example

In this post, you will learn the concepts of Adaline (ADAptive LInear NEuron), a machine learning algorithm, along with Python example.As like Perceptron, it is important to understand the concepts of Adaline as it forms the foundation of learning neural networks. The concept of Perceptron and Adaline could found to be useful in understanding how gradient descent can be used to learn the weights which when combined with input signals is used to make predictions based on unit step function output. Here are the topics covered in this post in relation to Adaline algorithm and its Python implementation: What’s Adaline? Adaline Python implementation Model trained using Adaline implementation What’s Adaptive …

## Python Implementations of Machine Learning Models

This post highlights some great pages where python implementations for different machine learning models can be found. If you are a data scientist who wants to get a fair idea of whats working underneath different machine learning algorithms, you may want to check out the Ml-from-scratch page. The top highlights of this repository are python implementations for the following: Supervised learning algorithms (linear regression, logistic regression, decision tree, random forest, XGBoost, Naive bayes, neural network etc) Unsupervised learning algorithms (K-means, GAN, Gaussian mixture models etc) Reinforcement learning algorithms (Deep Q Network) Dimensionality reduction techniques such as PCA Deep learning Examples that make use of above mentioned algorithms Here is an insight into …

## Python – Extract Text from HTML using BeautifulSoup

In this post, you will learn about how to use Python BeautifulSoup and NLTK to extract words from HTML pages and perform text analysis such as frequency distribution. The example in this post is based on reading HTML pages directly from the website and performing text analysis. However, you could also download the web pages and then perform text analysis by loading pages from local storage. Python Code for Extracting Text from HTML Pages Here is the Python code for extracting text from HTML pages and perform text analysis. Pay attention to some of the following in the code given below: URLLib request is used to read the html page …

## Top 10 Data Science Skills for Product Managers

In this post, you will learn about some of the top data science skills / concepts which may be required for product managers / business analyst to have, in order to create useful machine learning based solutions. Here are some of the topics / concepts which need to be understood well by product managers / business analysts in order to tackle day-to-day challenges while working with data science / machine learning teams. Knowing these concepts will help product managers / business analyst acquire enough skills in order to solve machine learning based problems. Understanding the difference between AI, machine learning, data science, deep learning Which problems are machine learning problems? …

## RANSAC Regression Explained with Python Examples

In this post, you will learn about the concepts of RANSAC regression algorithm along with Python Sklearn example for RANSAC regression implementation using RANSACRegressor. RANSAC regression algorithm is useful for handling the outliers dataset. Instead of taking care of outliers using statistical and other techniques, one can use RANSAC regression algorithm which takes care of the outlier data. In this post, the following topics are covered: Introduction to RANSAC regression RANSAC Regression Python code example Introduction to RANSAC Regression RANSAC (RANdom SAmple Consensus) algorithm takes linear regression algorithm to the next level by excluding the outliers in the training dataset. The presence of outliers in the training dataset does impact …

## Beta Distribution Explained with Python Examples

In this post, you will learn about Beta probability distribution with the help of Python examples. As a data scientist, it is very important to understand beta distribution as it is used very commonly as prior in Bayesian modeling. In this post, the following topics get covered: Beta distribution intuition and examples Introduction to beta distribution Beta distribution python examples Beta Distribution Intuition & Examples Beta distribution is widely used to model the prior beliefs or probability distribution in real world applications. Here is a great article on understanding beta distribution with an example of baseball game. You may want to pay attention to the fact that even if the baseball …

I found it very helpful. However the differences are not too understandable for me