Category Archives: Data Science

AI / Machine Learning Bias Explained with Examples

machine learning models bias variance vs complexity

In the artificial intelligence (AI) / machine learning (ML) powered world where predictive models have started getting used more often in decision-making areas, the primary concerns of policy makers, auditors and end users have been to make sure that these models are not taking biased/unfair decisions based on model predictions (intentional or unintentional discrimination). Imagine industries such as banking, insurance, and employment where models are used as solutions to decision-making problems such as shortlisting candidates for interviews, approving loans/credits, deciding insurance premiums etc. How harmful it could be to the end users as these decisions may impact their livelihood based on biased predictions made by the model, thereby, resulting in unfair/biased decisions. …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Security Attacks Analysis of Machine Learning Models

Threat Model - Security Attacks on Machine Learning Models

Have you wondered around what would it be like to have your machine learning (ML) models come under security attack? In other words, your machine learning models get hacked. Have you thought through how to check/monitor security attacks on your AI models? As a data scientist/machine learning researcher, it would be good to know some of the scenarios related to security/hacking attacks on ML models. In this post, you would learn about some of the following aspects related to security attacks (hacking) on machine learning models. Examples of Security Attacks on ML Models Hacking machine learning (ML) models means…? Different types of Security Attacks Monitoring security attacks Examples of Security Attacks on ML Models Most of …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , , .

JupyterLab & Jupyter Notebook Cheat Sheet Commands

jupyter notebook cheat sheet commands

Are you starting to create machine learning models (using python programming) using JupyterLab or Jupyter Notebook? This post list down some commands which are found to be very useful while one (beginner data scientist) is getting started with using JupyterLab notebook for building machine learning models. Notebook Operations: The following command helps to perform operations with the notebook. Ctrl + S: Save the notebook Ctrl + Q: Close the notebook Enter: While on any cell, you want to enter edit mode, press Enter. Cells Operation: The following commands help with performing operations on cells: J: Select the cell below the current cell; This command would be used to go through cells below the …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Missing Data Imputation Techniques in Machine Learning

Missing Data Imputation Techniques

Have you come across the problem of handling missing data/values for respective features in machine learning (ML) models during prediction time? This is different from handling missing data for features during training/testing phase of ML models. Data scientists are expected to come up with an appropriate strategy to handle missing data during, both, model training/testing phase and also model prediction time (runtime). In this post, you will learn about some of the following imputation techniques which could be used to replace missing data with appropriate values during model prediction time. Validate input data before feeding into ML model; Discard data instances with missing values Predicted value imputation Distribution-based imputation Unique value imputation Reduced feature models Below is the diagram …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Code of Ethics in Artificial Intelligence (AI) – Key Traits

Code of Ethics for Artificial Intelligence

Do you know that organizations have started paying attention to whether AI/machine learning (ML) models are doing unbiased, safe and trustable predictions based on ethical principles? Have you thought through consequences if AI/machine learning (ML) models you created for your clients make predictions which are biased towards a class of customer, thus, hurting other customers? Have you imagined scenarios in which customers blame your organization of benefitting a section of customers (preferably their competitors), thus, filing a case against your organization and bring bad names and loss to your business? Have you imagined the scenarios when ML models start making incorrect predictions which could result in loss of business? If above …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Ethical AI – Lessons from Google AI Principles

AI Guiding Principles for Ethical AI

Is your organization using AI/machine learning for many of its products or planning to use AI models extensively for upcoming products? Do you have an AI guiding principles in place for stakeholders such as product management, data scientists/machine learning researchers to make sure that safe and unbiased AI (as appropriate) is used for developing AI-based solutions? Are you planning to create AI guiding principles for the AI stakeholders including business stakeholders, customers, partners etc? If the answer to above is not in affirmation, it is recommended that you should start thinking about laying down AI guiding principles, sooner than later, in place to help different stakeholders such as executive team, …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Why take Google Machine Learning Crash Course?

ML model training validation testing

This post represents my thoughts on why you should take the Google Machine Learning (ML) Crash Course. Most importantly, this course would benefit both the beginners and also the intermediate level data scientists/machine learning researchers. Each of the topics is covered as with videos, reading text and programming exercises. You learn some of the following as part of doing the course: ML concepts which help learn concepts related to building machine learning models such as training/validating/testing the models, feature engineering, model overfitting, regularization techniques to penalize complex models, neural networks etc. ML engineering concepts which help learn different aspects of machine learning system such as ML systems components, offline/online training, offline/online prediction, …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

How to Choose Right Machine Learning Algorithms?

How to Select Right Machine Learning Algorithms

In this post, you will learn about tips and techniques which could be used for selecting or choosing the right machine learning algorithms for your machine learning problem. These could be very useful for those data scientists or ML researcher starting to learn data science/machine learning topics.     Based on the following, one could go for selecting different classes of machine learning algorithms for training the models. Availability of data Number of features This post deals with the following different scenarios while explaining machine learning algorithms which could be used to solve related problems: A large number of Features, Lesser Volume of Data A smaller number of Features, Large …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

QA – How Reliable are your Machine Learning Systems?

ML Model Reliability

In this post, you will learn about different aspects of creating a machine learning system with high reliability. It should be noted that system reliability is one of the key software quality attributes as per ISO 25000 SQUARE specifications. Have you put measures in place to ensure high reliability of your machine learning systems? In this post, you will learn about some of the following: What is the reliability of machine learning systems? Why bother about machine learning models reliability? Who should take care of the ML systems reliability? What is the Reliability of Machine Learning Systems? As like software applications, the reliability of machine learning systems is primarily related to …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Machine Learning – Sensitivity vs Specificity Difference

sensitivity vs specificity vs ROC vs AUC

In this post, we will try and understand the concepts behind machine learning model evaluation metrics such as sensitivity and specificity which is used to determine the performance of the machine learning models. The post also describes the differences between sensitivity and specificity. The concepts have been explained using the model for predicting whether a person is suffering from a disease or not. You may want to check out another related post titled ROC Curve & AUC Explained with Python examples. What is Sensitivity Sensitivity is a measure of the proportion of actual positive cases which got predicted as positive (or true positive). Sensitivity is also termed as Recall. This implies that there …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Why is QA needed for Machine Learning Models?

QA for Machine Learning Models

Given that the machine learning models are also a kind of conventional software application, the quality assurance principles applied to the conventional software development would or should also apply to build the machine learning models. In this post, you would learn about some of the important reasons as to why Quality Assurance (QA)is important to make sure that the machine learning models of only high quality are deployed in the production. Given that the machine learning models are said to be non-testable, it presents a set of challenges to do the quality control checks or perform testing of machine learning models from a quality assurance perspective. In this relation, I …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

Testing Machine Learning Models on Dual Coding Principles

Automation of Dual Coding Testing of ML Models

This post intends to propose a technique termed as Dual Coding for testing or performing quality control checks on machine learning models from quality assurance (QA) perspective. This could be useful in performing black box testing of ML models. The proposed technique is based on the principles of Dual Coding Theory (DCT) hypothesized by Allan Paivio of the University of Western Ontario in 1971. According to Dual Coding Theory, our brain uses two different systems including verbal and non-verbal/visual to the gather, process, store and retrieve (recall) the information related to a particular subject. One of the key assumptions of dual coding theory is the connections (also termed as referential …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Blackbox Testing for Machine Learning Models

blackbox testing

Data science/Machine learning career has primarily been associated with building models which could do numerical or class-related predictions. This is unlike conventional software development which is associated with both development and “testing” the software. And, the related career profiles are software developer/engineers and test engineers/QA professional. However, in the case of machine learning, the career profile is a data scientist. The usage of the word “testing” in relation to machine learning models is primarily used for testing the model performance in terms of accuracy/precision of the model. It can be noted that the word, “testing”, means different for conventional software development and machine learning models development. Machine learning models would …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

Assessing Quality of AI Models from QA Standpoint

Quality of Machine Learning Models

In this post, you will learn about the definition of quality of AI / machine learning (ML) models. Getting a good understanding of what is the high and low quality of AI models would help you design quality control checks for testing machine learning models and related quality assurance (QA) practices. This post would be a good read for QA professionals in general. However, it would also help set perspectives for data scientists and machine learning experts. The following are some of the key quality traits which are described in detail for assessing the quality of AI models: Functional suitability Maintainability Usability Efficiency Security Portability When designing QA practice and related quality control checks, all of the above would need to be considered for testing …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Metamorphic Testing for Machine Learning Models

Metamorphic Relations for Machine Learning Models QA

In this post, you will learn about how metamorphic testing could be used for performing quality control checks/testing on machine learning models. The post is primarily meant for data science (QA) specialists to plan the test cases to test the machine learning (ML) model implementation from QA perspective. Testing machine learning models from a quality assurance perspective is different from testing machine learning models for accuracy/performance. The word “testing” is one of the conflicting technical nomenclatures given its usage by machine learning experts and software engineering community in general. In this post, the following topics are discussed: Introduction to metamorphic testing Why metamorphic testing for machine learning models? Automated metamorphic testing of ML models Introduction …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Why Machine Learning Systems are Non-testable

non-testability-of-machine-learning-systems

This post represents views on why machine learning systems or models are termed as non-testable from quality control/quality assurance perspectives. Before I proceed ahead, let me humbly state that data scientists/machine learning community has been saying that ML models are testable as they are first trained and then tested using techniques such as cross-validation etc., based on different techniques to increase the model performance, optimize the model.  However, “testing” the model is referred with the scenario during the development (model building) phase when data scientists test the model performance by comparing the model outputs (predicted values) with the actual values.  This is not the same as testing the model for any given input for which the …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .