Author Archives: Ajitesh Kumar
Instance-based vs Model-based Learning: Differences
Machine learning is a field of artificial intelligence that deals with giving machines the ability to learn without being explicitly programmed. In this context, instance-based learning and model-based learning are two different approaches used to create machine learning models. While both approaches can be effective, they also have distinct differences that must be taken into account when building a machine learning system. Let’s explore the differences between these two types of machine learning. What is instance-based learning & how does it work? Instance-based learning (also known as memory-based learning or lazy learning) involves memorizing training data in order to make predictions about future data points. This approach doesn’t require any …
Open Source Web Scraping Tools List
If you’re looking for a cost-effective way to access the data that matters most to your business, then web scraping is the answer. Web scraping is the process of extracting data from websites and can be used to gather valuable insights about market trends, customer behavior, competitor analysis, etc. To make this process easier, there are plenty of open source web scraping tools available. Let’s take a look at some of these tools and how they can help you collect and analyze data with greater efficiency. Beautiful Soup Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. This library allows you to parse HTML and XML …
Data-Driven Decision Making: What, Why & How?
Data-driven decision-making is a data-driven approach to making decisions to achieve desired outcome. More precisely, data-driven decision making is an insights-driven approach to drive decisions and related actions. The data can come from internal and external data sources to avoid data biases. Data-driven decision-makers use data in their decision process to validate existing actions or take new actions (predictive or prescriptive analytics). They make decisions based on the actionable insights generated from the data. The goal is to make informed decisions while ensuring trust & transparency across the stakeholders & organization as a whole. It can be noted that data-driven decision making provides great thrust to digital transformation initiatives. In …
Data Analyst, Data Scientist or Data Engineer: What to Become?
There is a lot of confusion surrounding the job designations or titles such as “data analyst,” “data scientist,” and “data engineer“. What do these job titles mean, and what are the differences between them? Before selecting one of these career path, it will be good to get a good understanding about these job titles or designations, related roles & responsibilities and career potential. In this blog post, we will describe each title / designation and discuss the key distinctions between them. By the end of this post, you will have a better understanding of which career path and related designations are right for you! Shall I become a data analyst? …
ETL & Data Quality Test Cases & Tools: Examples
Testing the data that is being processed from Extract, Transform and Load (ETL) processes is a critical step in ensuring the accuracy of data contained in destination systems and databases. This blog post will provide an overview of ETL & Data Quality testing including tools, test cases and examples. What is ETL? ETL stands for extract, transform, and load. ETL is a three-step process that is used to collect data from various sources, prepare the data for analysis, and then load it into a target database. The extract phase involves extracting data from its original source, such as a database or file system. The transform phase involves transforming this data …
Amazon Kinesis vs Kafka: Concepts, Differences
As technology advances, new data streaming solutions emerge to meet the ever-growing demand for real-time analytics. Two popular options are Amazon Kinesis and Apache Kafka. Here, we’ll take a look at these two platforms and compare them in terms of their core concepts and differences. What is Amazon Kinesis? Amazon Kinesis is an AWS serverless streaming service that allows you to collect, process, and analyze streaming data in real time. It is a fully managed service that can capture, store, and analyze hundreds of terabytes of data from millions of sources simultaneously. It is designed to be highly available and scalable so that your streaming data can be reliably processed …
Data Governance Framework Template / Example
Data governance is a framework that provides data management governance. It’s the process of structuring data so it can be governed, managed and used more effectively. Data governance framework forms the key aspect of data analytics strategy. This blog post will discuss key functions of a standard data governance framework and can be taken as a template or example to help you get started with setting up your data governance program. What is Data Governance Framework? Data governance can be defined as enterprise-wide management of data from availability, usability, security and integrity standpoint. The data governance framework is intended to put some structure around how data can be managed and …
ESG Concepts: Reports, Metrics & KPIs
This blog post is geared toward Environmental, Social & Governance (ESG) professionals looking to understand different aspects of ESG and some metrics that can be reported via ESG reports as part of their organization’s ESG reporting (annual reports) in relation to representing the sustainability aspect of their business. An understanding of different aspects of ESG can help you in getting started with ESG initiatives and ESG reporting. ESG initiatives can help companies improve their overall sustainability factor while creating a positive impact on environmental, social, and governance issues. Getting started with ESG-related practices in your organization or department (such as procurement) requires a set of ESG initiatives and related performance …
Data Warehouse vs. Data Lake – Differences, Examples
When it comes to data storage, there are two distinct types of solutions that you can use—a data warehouse and a data lake. Both of these solutions have their own benefits, but it’s important to understand the key differences between them so that you can choose the best option for your needs. Let’s take a closer look at what makes each solution unique. What is a Data Warehouse? A data warehouse is defined as an electronic storage system used for reporting and analysis. Data warehouses store data in a structured (row-column) format. It typically contains aggregated collections of data from multiple sources, which come together in one database. A data warehouse …
Different types of Clustering in Machine Learning
Clustering is a type of unsupervised machine learning technique that is used to group data points into distinct categories or clusters. It is one of the most widely used techniques in machine learning and can be used for various tasks such as grouping customers by their buying habits, creating groups of similar documents, or finding groups of related genes. In this blog post, we will explore different types / categories of clustering methods and discuss why they are so important in the field of machine learning. Prototype-based Clustering Prototype based clustering represents one of the categories of clustering algorithms that are used to identify groups within a larger dataset. This …
Python Pickle Example: What, Why, How
Have you ever heard of the term “Python Pickle“? If not, don’t feel bad—it can be a confusing concept. However, it is a powerful tool that all data scientists, Python programmers, and web application developers should understand. In this article, we’ll break down what exactly pickling is, why it’s so important, and how to use it in your projects. What is Python Pickle? In its simplest form, pickling is the process of converting any object into a byte stream (a sequence of bytes). This byte stream can then be transmitted over a network or stored in a file for later use. It’s like putting the object into an envelope and …
Designing & Building Data Products – Best Practices
For those in the analytics industry, designing and building data products is a critical part of the job. It’s important to understand how to design and build data products that are useful, efficient, effective and loved by the end customers. In this blog post, we will discuss some best practices for designing and developing innovative data products. It’s important to keep these best practices in mind when developing data products / solutions as they can help ensure your product is successful. Call out Decision – Action – Outcome Hypothesis It is important to call out decision-action-outcome hypotheses when building data products because it serves as a blueprint for designing, testing …
Top 10 Basic Computer Science Topics to Learn
Computer science is an expansive field with a variety of areas that are worth exploring. Whether you’re just starting out or already have some experience in computer science, there are certain topics that every aspiring software engineer should understand. This blog post will cover the basic computer science topics that are essential for any software engineer or software programmer to know. Computer Architecture Computer architecture is a course of study that explores the fundamental elements of computer building and design. It’s an important field of study for software engineers to understand, since it provides basic principles and concepts related to hardware and software interactions. Computer architecture courses typically cover a …
Free Datasets for Machine Learning & Deep Learning
Are you looking for free / popular datasets to use for your machine learning or deep learning project? Look no further! In this blog post, we will provide an overview of some of the best free datasets available for machine learning and deep learning. These datasets can be used to train and evaluate your models, and many of them contain a wealth of valuable information that can be used to address a wide range of real-world problems. So, let’s dive in and take a look at some of the top free datasets for machine learning and deep learning! Here is the list of free data sets for machine learning & …
Challenges for Machine Learning / AI Projects
In this post, you will learn about some of the key challenges in relation to achieving successful AI / machine learning (ML) or Data science projects implementation in a consistent and sustained manner. As AI / ML project stakeholders including senior management stakeholders, data science architects, product managers, etc, you must get a good understanding of what would it take to successfully execute AI / ML projects and create value for the customers and the business. Whether you are building AI / ML products or enabling unique models for your clients in SaaS setup, you will come across most of these challenges. Understanding the Business Problem Many times, the nature …
Difference between Online & Batch Learning
In this post, you will learn about the concepts and differences between online and batch or offline learning in relation to how machine learning models in production learn incrementally from the stream of incoming data or otherwise. It is one of the most important aspects of designing machine learning systems. Data science architects would require to get a good understanding of when to go for online learning and when to go for batch or offline learning. Why online learning vs batch or offline learning? Before we get into learning the concepts of batch and on-line or online learning, let’s understand why we need different types of models training or learning …
I found it very helpful. However the differences are not too understandable for me