Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Missing Data Imputation Techniques in Machine Learning

Missing Data Imputation Techniques

Have you come across the problem of handling missing data/values for respective features in machine learning (ML) models during prediction time? This is different from handling missing data for features during training/testing phase of ML models. Data scientists are expected to come up with an appropriate strategy to handle missing data during, both, model training/testing phase and also model prediction time (runtime). In this post, you will learn about some of the following imputation techniques which could be used to replace missing data with appropriate values during model prediction time. Validate input data before feeding into ML model; Discard data instances with missing values Predicted value imputation Distribution-based imputation Unique value imputation Reduced feature models Below is the diagram …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Code of Ethics in Artificial Intelligence (AI) – Key Traits

Code of Ethics for Artificial Intelligence

Do you know that organizations have started paying attention to whether AI/machine learning (ML) models are doing unbiased, safe and trustable predictions based on ethical principles? Have you thought through consequences if AI/machine learning (ML) models you created for your clients make predictions which are biased towards a class of customer, thus, hurting other customers? Have you imagined scenarios in which customers blame your organization of benefitting a section of customers (preferably their competitors), thus, filing a case against your organization and bring bad names and loss to your business? Have you imagined the scenarios when ML models start making incorrect predictions which could result in loss of business? If above …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Ethical AI – Lessons from Google AI Principles

AI Guiding Principles for Ethical AI

Is your organization using AI/machine learning for many of its products or planning to use AI models extensively for upcoming products? Do you have an AI guiding principles in place for stakeholders such as product management, data scientists/machine learning researchers to make sure that safe and unbiased AI (as appropriate) is used for developing AI-based solutions? Are you planning to create AI guiding principles for the AI stakeholders including business stakeholders, customers, partners etc? If the answer to above is not in affirmation, it is recommended that you should start thinking about laying down AI guiding principles, sooner than later, in place to help different stakeholders such as executive team, …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Why take Google Machine Learning Crash Course?

ML model training validation testing

This post represents my thoughts on why you should take the Google Machine Learning (ML) Crash Course. Most importantly, this course would benefit both the beginners and also the intermediate level data scientists/machine learning researchers. Each of the topics is covered as with videos, reading text and programming exercises. You learn some of the following as part of doing the course: ML concepts which help learn concepts related to building machine learning models such as training/validating/testing the models, feature engineering, model overfitting, regularization techniques to penalize complex models, neural networks etc. ML engineering concepts which help learn different aspects of machine learning system such as ML systems components, offline/online training, offline/online prediction, …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

How to Choose Right Machine Learning Algorithms?

How to Select Right Machine Learning Algorithms

In this post, you will learn about tips and techniques which could be used for selecting or choosing the right machine learning algorithms for your machine learning problem. These could be very useful for those data scientists or ML researcher starting to learn data science/machine learning topics.     Based on the following, one could go for selecting different classes of machine learning algorithms for training the models. Availability of data Number of features This post deals with the following different scenarios while explaining machine learning algorithms which could be used to solve related problems: A large number of Features, Lesser Volume of Data A smaller number of Features, Large …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

QA – How Reliable are your Machine Learning Systems?

ML Model Reliability

In this post, you will learn about different aspects of creating a machine learning system with high reliability. It should be noted that system reliability is one of the key software quality attributes as per ISO 25000 SQUARE specifications. Have you put measures in place to ensure high reliability of your machine learning systems? In this post, you will learn about some of the following: What is the reliability of machine learning systems? Why bother about machine learning models reliability? Who should take care of the ML systems reliability? What is the Reliability of Machine Learning Systems? As like software applications, the reliability of machine learning systems is primarily related to …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Configure Nexus Repository for Docker Registry (Windows)

Browse Docker Images in local Nexus Repository

In this post, you will learn about how to configure Nexus Repository OSS on Windows as a Docker Private Registry. The goal of doing this can be some of the following: Allow developers to push/pull images from local docker image repository installed within the company-wide private network Allow Jenkins jobs to pull images for running automated tasks One of the key aspects of DevOps automation using Docker containers is setting up private Docker registry which could be accessed by developers. This tutorial would help in setting up Nexus repository as a private Docker registry. How to Configure Nexus Repository OSS on Windows for Private Docker Registry The following are the steps to configure Nexus Repository OSS …

Continue reading

Posted in DevOps, Dockers, Tutorials. Tagged with , , .

Why is QA needed for Machine Learning Models?

QA for Machine Learning Models

Given that the machine learning models are also a kind of conventional software application, the quality assurance principles applied to the conventional software development would or should also apply to build the machine learning models. In this post, you would learn about some of the important reasons as to why Quality Assurance (QA)is important to make sure that the machine learning models of only high quality are deployed in the production. Given that the machine learning models are said to be non-testable, it presents a set of challenges to do the quality control checks or perform testing of machine learning models from a quality assurance perspective. In this relation, I …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

Testing Machine Learning Models on Dual Coding Principles

Automation of Dual Coding Testing of ML Models

This post intends to propose a technique termed as Dual Coding for testing or performing quality control checks on machine learning models from quality assurance (QA) perspective. This could be useful in performing black box testing of ML models. The proposed technique is based on the principles of Dual Coding Theory (DCT) hypothesized by Allan Paivio of the University of Western Ontario in 1971. According to Dual Coding Theory, our brain uses two different systems including verbal and non-verbal/visual to the gather, process, store and retrieve (recall) the information related to a particular subject. One of the key assumptions of dual coding theory is the connections (also termed as referential …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Blackbox Testing for Machine Learning Models

blackbox testing

Data science/Machine learning career has primarily been associated with building models which could do numerical or class-related predictions. This is unlike conventional software development which is associated with both development and “testing” the software. And, the related career profiles are software developer/engineers and test engineers/QA professional. However, in the case of machine learning, the career profile is a data scientist. The usage of the word “testing” in relation to machine learning models is primarily used for testing the model performance in terms of accuracy/precision of the model. It can be noted that the word, “testing”, means different for conventional software development and machine learning models development. Machine learning models would …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

Assessing Quality of AI Models from QA Standpoint

Quality of Machine Learning Models

In this post, you will learn about the definition of quality of AI / machine learning (ML) models. Getting a good understanding of what is the high and low quality of AI models would help you design quality control checks for testing machine learning models and related quality assurance (QA) practices. This post would be a good read for QA professionals in general. However, it would also help set perspectives for data scientists and machine learning experts. The following are some of the key quality traits which are described in detail for assessing the quality of AI models: Functional suitability Maintainability Usability Efficiency Security Portability When designing QA practice and related quality control checks, all of the above would need to be considered for testing …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Metamorphic Testing for Machine Learning Models

Metamorphic Relations for Machine Learning Models QA

In this post, you will learn about how metamorphic testing could be used for performing quality control checks/testing on machine learning models. The post is primarily meant for data science (QA) specialists to plan the test cases to test the machine learning (ML) model implementation from QA perspective. Testing machine learning models from a quality assurance perspective is different from testing machine learning models for accuracy/performance. The word “testing” is one of the conflicting technical nomenclatures given its usage by machine learning experts and software engineering community in general. In this post, the following topics are discussed: Introduction to metamorphic testing Why metamorphic testing for machine learning models? Automated metamorphic testing of ML models Introduction …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Why Machine Learning Systems are Non-testable

non-testability-of-machine-learning-systems

This post represents views on why machine learning systems or models are termed as non-testable from quality control/quality assurance perspectives. Before I proceed ahead, let me humbly state that data scientists/machine learning community has been saying that ML models are testable as they are first trained and then tested using techniques such as cross-validation etc., based on different techniques to increase the model performance, optimize the model.  However, “testing” the model is referred with the scenario during the development (model building) phase when data scientists test the model performance by comparing the model outputs (predicted values) with the actual values.  This is not the same as testing the model for any given input for which the …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA – Testing Features of Machine Learning Models

Testing Features of Machine Learning Models

In this post, you will learn about different types of test cases which you could come up for testing features of the data science/machine learning models. Testing features are one of the key set of QA tasks which needed to be performed for ensuring the high performance of machine learning models in a consistent and sustained manner. Features make the most important part of a machine learning model. Features are nothing but the predictor variable which is used to predict the outcome or response variable. Simply speaking, the following function represents y as the outcome variable and x1, x2 and x1x2 as predictor variables. y = a1x1 + a2x2 + a3x1x2 + e In the above function, …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA of Machine Learning Models with PDCA Cycle

QA and Machine learning Projects with PDCA Cycle

The primary goal of establishing and implementing Quality Assurance (QA) practices for machine learning/data science projects or, projects using machine learning models is to achieve consistent and sustained improvements in business processes making use of underlying ML predictions. This is where the idea of PDCA cycle (Plan-Do-Check-Act) is applied to establish a repeatable process ensuring that high-quality machine learning (ML) based solutions are served to the clients in a consistent and sustained manner. The following diagram represents the details. The following represents the details listed in the above diagram. Plan Explore/describe the business problems: In this stage, product managers/business analyst sit with data scientist and discuss the business problem at hand. The outcome of this …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .

QA & Data Science – How to Test Features Relevance

how to test feature relevance in data science

In this post, I intend to present a perspective on the need for QA / testing team to test the feature relevance when testing the machine learning models as part of data science QA initiatives, and, different techniques which could be used to test or perform QA on feature relevance. Feature relevance can also be termed as feature importance. Simply speaking, a feature is said to be relevant or important if it adds real predictive value to the underlying model. The relevant features must display a stable statistical relationship or association with the outcome variable. Well, an association does not imply a causation. However, a relevant feature or a feature …

Continue reading

Posted in Data Science, Machine Learning, QA, Testing. Tagged with , , , .