Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Instance-based vs Model-based Learning: Differences

model based learning example

Machine learning is a field of artificial intelligence that deals with giving machines the ability to learn without being explicitly programmed. In this context, instance-based learning and model-based learning are two different approaches used to create machine learning models. While both approaches can be effective, they also have distinct differences that must be taken into account when building a machine learning system. Let’s explore the differences between these two types of machine learning. What is instance-based learning & how does it work? Instance-based learning (also known as memory-based learning or lazy learning) involves memorizing training data in order to make predictions about future data points. This approach doesn’t require any …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Open Source Web Scraping Tools List

web scraping tool list

If you’re looking for a cost-effective way to access the data that matters most to your business, then web scraping is the answer. Web scraping is the process of extracting data from websites and can be used to gather valuable insights about market trends, customer behavior, competitor analysis, etc. To make this process easier, there are plenty of open source web scraping tools available. Let’s take a look at some of these tools and how they can help you collect and analyze data with greater efficiency. Beautiful Soup Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. This library allows you to parse HTML and XML …

Continue reading

Posted in Data, data engineering. Tagged with .

Data-Driven Decision Making: What, Why & How?

analytics key factor in decision making

Data-driven decision-making is a data-driven approach to making decisions to achieve desired outcome. More precisely, data-driven decision making is an insights-driven approach to drive decisions and related actions. The data can come from internal and external data sources to avoid data biases. Data-driven decision-makers use data in their decision process to validate existing actions or take new actions (predictive or prescriptive analytics). They make decisions based on the actionable insights generated from the data. The goal is to make informed decisions while ensuring trust & transparency across the stakeholders & organization as a whole. It can be noted that data-driven decision making provides great thrust to digital transformation initiatives. In …

Continue reading

Posted in Data, Data analytics, Machine Learning. Tagged with .

Data Analyst, Data Scientist or Data Engineer: What to Become?

data analysts vs data scientists vs data engineers

There is a lot of confusion surrounding the job designations or titles such as “data analyst,” “data scientist,” and “data engineer“. What do these job titles mean, and what are the differences between them? Before selecting one of these career path, it will be good to get a good understanding about these job titles or designations, related roles & responsibilities and career potential. In this blog post, we will describe each title / designation and discuss the key distinctions between them. By the end of this post, you will have a better understanding of which career path and related designations are right for you! Shall I become a data analyst? …

Continue reading

Posted in Career Planning, Data, Data analytics, data engineering, Data Science.

ETL & Data Quality Test Cases & Tools: Examples

data validation with great expectations

Testing the data that is being processed from Extract, Transform and Load (ETL) processes is a critical step in ensuring the accuracy of data contained in destination systems and databases. This blog post will provide an overview of ETL & Data Quality testing including tools, test cases and examples. What is ETL? ETL stands for extract, transform, and load. ETL is a three-step process that is used to collect data from various sources, prepare the data for analysis, and then load it into a target database. The extract phase involves extracting data from its original source, such as a database or file system. The transform phase involves transforming this data …

Continue reading

Posted in data engineering, Data management. Tagged with .

Amazon Kinesis vs Kafka: Concepts, Differences

Amazon Kinesis Data Streaming

As technology advances, new data streaming solutions emerge to meet the ever-growing demand for real-time analytics. Two popular options are Amazon Kinesis and Apache Kafka. Here, we’ll take a look at these two platforms and compare them in terms of their core concepts and differences. What is Amazon Kinesis? Amazon Kinesis is an AWS serverless streaming service that allows you to collect, process, and analyze streaming data in real time. It is a fully managed service that can capture, store, and analyze hundreds of terabytes of data from millions of sources simultaneously. It is designed to be highly available and scalable so that your streaming data can be reliably processed …

Continue reading

Posted in data engineering. Tagged with .

Data Governance Framework Template / Example

data governance framework template

Data governance is a framework that provides data management governance. It’s the process of structuring data so it can be governed, managed and used more effectively. Data governance framework forms the key aspect of data analytics strategy. This blog post will discuss key functions of a standard data governance framework and can be taken as a template or example to help you get started with setting up your data governance program. What is Data Governance Framework? Data governance can be defined as enterprise-wide management of data from availability, usability, security and integrity standpoint. The data governance framework is intended to put some structure around how data can be managed and …

Continue reading

Posted in Data, Data analytics. Tagged with , .

ESG Concepts: Reports, Metrics & KPIs

ESG KPIs and metrics

This blog post is geared toward Environmental, Social & Governance (ESG) professionals looking to understand different aspects of ESG and some metrics that can be reported via ESG reports as part of their organization’s ESG reporting (annual reports) in relation to representing the sustainability aspect of their business. An understanding of different aspects of ESG can help you in getting started with ESG initiatives and ESG reporting. ESG initiatives can help companies improve their overall sustainability factor while creating a positive impact on environmental, social, and governance issues.  Getting started with ESG-related practices in your organization or department (such as procurement) requires a set of ESG initiatives and related performance …

Continue reading

Posted in Data analytics, Procurement. Tagged with , .

Data Warehouse vs. Data Lake – Differences, Examples

data warehouse vs data lake

When it comes to data storage, there are two distinct types of solutions that you can use—a data warehouse and a data lake. Both of these solutions have their own benefits, but it’s important to understand the key differences between them so that you can choose the best option for your needs. Let’s take a closer look at what makes each solution unique.  What is a Data Warehouse? A data warehouse is defined as an electronic storage system used for reporting and analysis. Data warehouses store data in a structured (row-column) format. It typically contains aggregated collections of data from multiple sources, which come together in one database. A data warehouse …

Continue reading

Posted in Data, Data lake, Data Science, Data Warehouse. Tagged with , , .

Different types of Clustering in Machine Learning

Different types of clustering

Clustering is a type of unsupervised machine learning technique that is used to group data points into distinct categories or clusters. It is one of the most widely used techniques in machine learning and can be used for various tasks such as grouping customers by their buying habits, creating groups of similar documents, or finding groups of related genes. In this blog post, we will explore different types / categories of clustering methods and discuss why they are so important in the field of machine learning. Prototype-based Clustering Prototype based clustering represents one of the categories of clustering algorithms that are used to identify groups within a larger dataset. This …

Continue reading

Posted in Machine Learning. Tagged with , , .

Python Pickle Example: What, Why, How

python pickle file example

Have you ever heard of the term “Python Pickle“? If not, don’t feel bad—it can be a confusing concept. However, it is a powerful tool that all data scientists, Python programmers, and web application developers should understand. In this article, we’ll break down what exactly pickling is, why it’s so important, and how to use it in your projects. What is Python Pickle? In its simplest form, pickling is the process of converting any object into a byte stream (a sequence of bytes). This byte stream can then be transmitted over a network or stored in a file for later use. It’s like putting the object into an envelope and …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Designing & Building Data Products – Best Practices

designing and building data products - best practices

For those in the analytics industry, designing and building data products is a critical part of the job. It’s important to understand how to design and build data products that are useful, efficient, effective and loved by the end customers. In this blog post, we will discuss some best practices for designing and developing innovative data products. It’s important to keep these best practices in mind when developing data products / solutions as they can help ensure your product is successful. Call out Decision – Action – Outcome Hypothesis It is important to call out decision-action-outcome hypotheses when building data products because it serves as a blueprint for designing, testing …

Continue reading

Posted in Data, Product Management. Tagged with , .

Top 10 Basic Computer Science Topics to Learn

computer architecture - basic computer topics to learn

Computer science is an expansive field with a variety of areas that are worth exploring. Whether you’re just starting out or already have some experience in computer science, there are certain topics that every aspiring software engineer should understand. This blog post will cover the basic computer science topics that are essential for any software engineer or software programmer to know. Computer Architecture Computer architecture is a course of study that explores the fundamental elements of computer building and design. It’s an important field of study for software engineers to understand, since it provides basic principles and concepts related to hardware and software interactions. Computer architecture courses typically cover a …

Continue reading

Posted in Data Science, Software Engg.

Free Datasets for Machine Learning & Deep Learning

dataset publicly_available free machine learning

Are you looking for free / popular datasets to use for your machine learning or deep learning project? Look no further! In this blog post, we will provide an overview of some of the best free datasets available for machine learning and deep learning. These datasets can be used to train and evaluate your models, and many of them contain a wealth of valuable information that can be used to address a wide range of real-world problems. So, let’s dive in and take a look at some of the top free datasets for machine learning and deep learning! Here is the list of free data sets for machine learning & …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning. Tagged with , .

Challenges for Machine Learning / AI Projects

Challenges related to Machine Learning Projects Implementations

In this post, you will learn about some of the key challenges in relation to achieving successful AI / machine learning (ML) or Data science projects implementation in a consistent and sustained manner. As AI / ML project stakeholders including senior management stakeholders, data science architects, product managers, etc, you must get a good understanding of what would it take to successfully execute AI / ML projects and create value for the customers and the business.  Whether you are building AI / ML products or enabling unique models for your clients in SaaS setup, you will come across most of these challenges.  Understanding the Business Problem Many times, the nature …

Continue reading

Posted in AI, Machine Learning. Tagged with , .

Difference between Online & Batch Learning

online learning - machine learning system

In this post, you will learn about the concepts and differences between online and batch or offline learning in relation to how machine learning models in production learn incrementally from the stream of incoming data or otherwise. It is one of the most important aspects of designing machine learning systems. Data science architects would require to get a good understanding of when to go for online learning and when to go for batch or offline learning. Why online learning vs batch or offline learning? Before we get into learning the concepts of batch and on-line or online learning, let’s understand why we need different types of models training or learning …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .