Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog,

Binomial Distribution Explained with Examples

binomial experiment coin tossing 100 experiments 50 trials

Have you ever wondered how to predict the number of successes in a series of independent trials? Or perhaps you’ve been curious about the probability of achieving a specific outcome in a sequence of yes-or-no questions. If so, we are essentially talking about the binomial distribution. It’s important for data scientists to understand this concept as binomials are used often in business applications. The binomial distribution is a discrete probability distribution that applies to binomial experiments (experiments with binary outcomes). It’s the number of successes in a specific number of trials. Sighting a simple yet real-life example, the binomial distribution may be imagined as the probability distribution of a number …

Continue reading

Posted in AI, Data Science, Machine Learning, statistics. Tagged with , , .

Online Data Science Courses at JHU 2023

Online Data Science Courses at JHU 2023

Are you interested in pursuing a Data Science course from the comfort of your own home? Look no further than Johns Hopkins University (JHU), offering a comprehensive range of Online Data Science Courses for the year 2023. Whether you are a working professional seeking to enhance your skills or a student looking to delve into the exciting world of data science, JHU’s online programs provide the flexibility and quality education you need. In this blog, we will explore the diverse array of online courses available at JHU, designed to cater to remote learners who want to excel in the field of Data Science. Discover the cutting-edge curriculum, esteemed faculty, and …

Continue reading

Posted in Admissions, Career Planning, Online Courses.

Model Cards Example Machine Learning

model card example machine learning

Have you ever wondered how to make your machine learning models more transparent, understandable, and accountable? Are you looking to implement responsible AI practices including ways and means to review and improve your existing model documentation? If so, you will learn about the concept of model cards, a powerful tool for documenting important details about machine learning models. You will learn the concepts with concrete examples and best practices that can serve as a guide for implementing or improving model cards in your organizations. The model card example can be seen as an standard template for model card which gets used in various different companies such as Google. What are …

Continue reading

Posted in Machine Learning, Responsible AI. Tagged with , .

Difference between Data Science & Data Analytics

data science vs data analytics

What’s the difference between data science and data analytics? Many people use these terms interchangeably, but there is a big distinction between the two fields. Data science is more focused on understanding and deriving insights from data while leveraging statistical and machine learning methods, while data analytics is an overarching term used to solve problems using analytical techniques while leveraging data. Both the terms are in a way related. In this blog post, we’ll explore the differences between data science and data analytics in greater detail, with examples of each. The following are key topics in relation to the difference between data science and data analytics: Different forms/purposes Different techniques …

Continue reading

Posted in Data analytics, Data Science. Tagged with , .

Top US Universities for AI / ML Research

Artificial Intelligence (AI) has become an essential driver of innovation and economic growth in the 21st century. As a result, some of the best universities in the United States have been investing heavily in AI research to push the boundaries of this rapidly evolving field. In this blog post, we will explore the top 10 US universities for AI research, highlighting their achievements and providing links to their AI research homepages. Several leading / best universities in the United States have emerged as pioneers in AI research, recognizing its crucial role in driving innovation and economic growth. These institutions have made significant investments to establish themselves as top destinations for …

Continue reading

Posted in Admissions, AI, Career Planning, Machine Learning. Tagged with .

Hold-out Method for Training Machine Learning Models


The hold-out method for training the machine learning models is a technique that involves splitting the data into different sets: one set for training, and other sets for validation and testing. The hold-out method is used to check how well a machine learning model will perform on the new data.  In this post, you will learn about the hold-out method used during the process of training the machine learning model. Do check out my post on what is machine learning? concepts & examples for a detailed understanding of different aspects related to the basics of machine learning. Also, check out a related post on what is data science? When evaluating …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

One-way ANOVA test: Concepts, Formula & Examples

one way anova test

The one-way analysis of variance (ANOVA) test is a statistical procedure commonly used to compare the means values on a specific variable between three or more groups. The significance of the difference between the means of two samples can be judged through either t-test or z-test depending upon different criteria, but it becomes tricky when there is a need to simultaneously evaluate the significance of the difference amongst three or more sample means. This is where one-way ANOVA test comes to rescue. The ANOVA technique enables us to perform this simultaneous test and as such is considered to be an important tool of analysis. As data scientists, it is of …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Neyman-Pearson Lemma: Hypothesis Test, Examples

neyman-pearson lemma critical region vs likelihood test ratio

Have you ever faced a crucial decision where you needed to rely on data to guide your choice? Whether it’s determining the effectiveness of a new medical treatment or assessing the quality of a manufacturing process, hypothesis testing becomes essential. That’s where the Neyman-Pearson Lemma steps in, offering a powerful framework for making informed decisions based on statistical evidence. The Neyman-Pearson Lemma holds immense importance when it comes to solving problems that demand decision making or conclusions to a higher accuracy. By understanding this concept, we learn to navigate the complexities of hypothesis testing, ensuring we make the best choices with greater confidence. In this blog post, we will explore …

Continue reading

Posted in Data Science, statistics. Tagged with , , .

Pandas CSV to Dataframe Python Example

Read CSV Files to Pandas Dataframe using Python

Converting CSV files to DataFrames is a common task in data analysis. In this blog, we’ll explore a Python code example using the Pandas library to efficiently convert CSV files to DataFrames. This approach offers flexibility, speed, and convenience, making it a valuable technique for handling large datasets. Read CSV into Pandas Dataframe The following is the code which can be used to read the CSV file from local drive: In case, you want to read CSV file from the URL, the following will be the code. As a matter of fact, nothing changes except for the fact that you pass the URL to read_csv function. The following are some …

Continue reading

Posted in Data Science, Python. Tagged with , .

Google Unveils Next-Gen LLM, PaLM-2

PaLM 2 Apps

Google’s breakthrough research in machine learning and responsible AI has culminated in the development of their next-generation large language model (LLM), PaLM 2. This model represents a significant evolution in natural language processing (NLP) technology, with the capability to perform a broad array of advanced reasoning tasks, including code and math, text classification and question answering, language translation, and natural language generation. The unique combination of compute-optimal scaling, an improved dataset mixture, and model architecture enhancements is what powers PaLM 2’s exceptional capabilities. This combination allows the model to achieve superior performance than its predecessors, including the original PaLM, across all tasks. PaLM 2 was built with Google’s commitment to …

Continue reading

Posted in Generative AI, Machine Learning. Tagged with , .

Occam’s Razor in Machine Learning: Examples

Occam's Razor in Machine Learning

“Everything should be made as simple as possible, but not simpler.” – Albert Einstein Consider this: According to a recent study by IDC, data scientists spend approximately 80% of their time cleaning and preparing data for analysis, leaving only 20% of their time for the actual tasks of analysis, modeling, and interpretation. Does this sound familiar to you? Are you frustrated by the amount of time you spend on complex data wrangling and model tuning, only to find that your machine learning model doesn’t generalize well to new data? As data scientists, we often find ourselves in a predicament. We strive for the highest accuracy and predictive power in our …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Generative AI Risks & Concerns: Examples

generative ai risks and concerns

In the ever-evolving realm of artificial intelligence, generative AI has emerged as a groundbreaking technology, capable of producing incredibly realistic and creative content. From generating art and music to crafting compelling stories and even mimicking human conversations, the possibilities seem endless. Here is a sample representing AI generated talk between Bill Gates & Socrates. You can as well imagine about the endless possibilities. As with any powerful tool, there are risks and concerns related to generative AI that need to be addressed. In this blog, we will delve into the fascinating world of generative AI and explore some of the key concerns it brings forth. We will learn with some …

Continue reading

Posted in Generative AI. Tagged with .

Outlier Detection Techniques in Python: Examples

Outlier detection Python Machine Learning

In the realm of data science, mastering outlier detection techniques is paramount for ensuring data integrity and robust machine learning model performance. Outliers are the data points which deviate significantly from the norm. The outliers data points can greatly impact the accuracy and reliability of statistical analyses and machine learning models. In this blog, we will explore a variety of outlier detection techniques using Python. The methods covered will include statistical approaches like the z-score method and the interquartile range (IQR) method, as well as visualization techniques like box plots and scatter plots. Whether you are a data science enthusiast or a seasoned professional, it is important to grasp these …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Lime Machine Learning Python Example

LIME Output of Linear Regression Model

Today when core businesses have started relying on machine learning (ML) models predictions, interpreting complex models has become a necessary requirement of AI governance (responsible AI). Data scientists are often asked to explain the inner workings of a machine learning models for understanding how the decisions are made. The Problem? Many of these models stand out as “black boxes“, delivering predictions without any comprehensible reasoning. This lack of transparency (especially in healthcare & finance use cases) can lead to mistrust in model predictions and inhibit the practical application of machine learning in fields that require a high degree of interpretability. It could lead to erroneous decision-making, or worse, legal and …

Continue reading

Posted in Machine Learning, Responsible AI. Tagged with .

Boston Housing Dataset Linear Regression: Predicting House Prices

boston housing dataset linear regression models

Predicting house prices accurately is crucial in the real estate industry. However, it can be challenging to determine the factors that significantly impact house prices. Without a clear understanding of these factors, accurate predictions are difficult to achieve. The Boston Housing Dataset addresses this problem by providing a comprehensive set of variables that influence house prices in the Boston area. However, effectively utilizing this dataset and building robust predictive models require appropriate techniques and evaluation methods. In this blog, we will provide an overview of the Boston Housing Dataset and explore linear regression, LASSO, and Ridge regression as potential models for predicting house prices. Each model has its unique properties …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

ChatGPT Cheat Sheet for Data Scientists

ChatGPT Cheat Sheet for Data Scientists

With the explosion of data being generated, data scientists are facing increased pressure to analyze and interpret large amounts of text data effectively. However, this can be a challenging task, especially when dealing with unstructured data. Additionally, data scientists often spend a significant amount of time manually generating text and answering complex questions, which can be a time-consuming process. Welcome ChatGPT! ChatGPT offer a powerful solution to these challenges. By learning different ChatGPT prompts, data scientists can significantly become super productive while generating relevant insights, answer complex questions, and perform machine learning tasks with ease such as data preprocessing, hypothesis testing, training models, etc. In this blog, I will provide …

Continue reading

Posted in ChatGPT, Data Science, Generative AI, Machine Learning. Tagged with , , , .