Problems with Categorical Variables: Examples

Problems with categorical variables in machine learning

Have you ever encountered unfamiliar words while learning a new language and didn’t know their meanings? Or tried to fit all your belongings into a suitcase, only to realize it’s too full? Or started reading a book series from the third book and felt lost? These scenarios in our daily lives surprisingly resemble some challenges we face with categorical variables in machine learning. Categorical variables, while essential in many datasets, bring with them a unique set of challenges. In this article, we’ll be discussing three major problems associated with categorical features: Let’s explore each with real-life examples and supporting Python code snippets. Incomplete Vocabulary The “Incomplete Vocabulary” problem arises when …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , , .

Central Tendency in Machine Learning: Python Examples

central tendency machine learning python examples

Have you ever wondered why your machine learning model is not performing as expected? Could the “average” behavior of your dataset be misleading your model? How does the “central” or “typical” value of a feature influence the performance of a machine learning model? In this blog, we will explore the concept of central tendency, its significance in machine learning, and the importance of addressing skewness in your dataset. All of this will be demonstrated with the help of Python code examples using a diabetes dataset. We will be working with the diabetes dataset which can be found on Kaggle – Diabetes Dataset. The dataset consists for multiple columns such as …

Continue reading

Posted in Data Science.

Data Analytics for Car Dealers: Actionable Insights

car dealers data analytics inventory management

Are you starting a car dealership and wondering how to leverage data to make informed business decisions? In today’s data-driven world, analytics can be the difference between a thriving business and a failing one. This blog aims to provide actionable insights for car dealers, especially those starting new car dealer business, to excel in various business aspects. I will cover inventory management, pricing strategy, marketing and sales, customer service, and risk mitigation, all backed by data analytics. I will continue to update this blog with more methods in time to come. The data used for analysis can be found on the Kaggle.com – Ultimate Car Price Prediction Dataset. First and …

Continue reading

Posted in Data analytics, Data Science, Python. Tagged with , .

Unemployment Data & Actionable Insights Examples

Distribution of unemployment rates and actionable insights

Unemployment figures often flood the news, painting a broad picture of economic stability or crisis. But have you ever wondered how these rates break down at the local level? Do certain counties (or cities) in different states fare better or worse than the national average, and if so, why? Unemployment is a critical indicator of economic health and social well-being. While national or state-level unemployment rates often make headlines, diving deeper into county-level or city level data can offer valuable insights for local governments, policymakers, and social organizations. In this blog, we will explore a dataset that provides unemployment rates for various U.S. counties in June 2023. Along the way, …

Continue reading

Posted in Data, Data analytics. Tagged with .

Insurance & Linear Regression Model Example

Ever wondered how insurance companies determine the premiums you pay for your health insurance? Predicting insurance premiums is more than just a numbers game—it’s a task that can impact millions of lives. In this blog, we’ll demystify this complex process by walking you through an end-to-end example of predicting health insurance premium charges by demonstrating with Python code example. Specifically, we’ll use a linear regression model to predict these charges based on various factors like age, BMI, and smoking status. Whether you’re a beginner in data science or a seasoned professional, this blog will offer valuable insights into building and evaluating regression models. What is Linear Regression? Linear Regression is …

Continue reading

Posted in Data Science, Insurance, Machine Learning, statistics. Tagged with , , .

Chi-square test – Formula, Concepts, Examples

chi-square test for test of independence

The Pearson’s Chi-square (χ2) test is a statistical test used to determine whether the distribution of observed data is consistent with the distribution of data expected under a particular hypothesis. The Chi-square test can be used to compare or evaluate the independence of two distributions, or to assess the goodness of fit of a given distribution to observed data. In this blog post, we will discuss different types of Chi-square tests, the concepts behind them, and how to perform them using Python / R. As data scientists, it is important to have a strong understanding of the Chi-square test so that we can use it to make informed decisions about …

Continue reading

Posted in Data Science, Python, statistics. Tagged with , .

Text Clustering Real-World Applications: Examples

Text Clustering Real World Applications and Examples

How often have you wondered about the vast amounts of unstructured data around us and its untapped potential? How can businesses sift through thousands of customer reviews, documents, or feedback to derive actionable insights? What if there was a way to automatically group similar pieces of text, helping organizations quickly identify patterns and trends? Enter text clustering. A subset of text analytics, text clustering is an unsupervised machine learning task that divides a set of texts into clusters or groups. This ensures that texts in the same group are more similar to each other than to those in other groups. A powerful tool for deciphering insights from unstructured data, text …

Continue reading

Posted in Machine Learning, NLP. Tagged with , .

Contract Analysis & Review Checklist: Questions, Examples

contract review checklist

Have you ever found yourself knee-deep in contractual jargon, wondering if you’ve missed a critical clause that could cost your organization thousands or even millions? How confident are you that every contract your team signs is optimized for both performance and cost efficiency? If you’re a procurement stakeholder, a category manager, or a contract specialist, these questions are not just hypothetical—they’re the daily challenges you face. In this blog, you will learn about a structured approach to learning, understanding, and reviewing contracts, minimizing risks, and maximizing value based on asking the right kind of questions. We delve into key questions you should be asking, highlight essential clauses to scrutinize, and …

Continue reading

Posted in Data analytics, Procurement. Tagged with .

Find Topics of Text Clustering: Python Examples

Finding topics for text clusters using Python

Have you ever clustered a collection of texts and wondered what predominant topics underlie each group? How can you pinpoint the essence of each cluster comprising of large volume of words? Is there a way to succinctly represent the core topic of each cluster using Python? Text clustering is a powerful technique in natural language processing (NLP) that groups documents into clusters based on their content. Once you’ve clustered your data, a natural follow-up question arises: “What are these clusters about?” In this article, we’ll discuss two different methods to find the dominant topics of text clusters using Python. Meanwhile, check out my post on text clustering – Text Clustering …

Continue reading

Posted in Machine Learning, NLP, Python. Tagged with , , .

Productivity vs Efficiency: Differences, Examples

productivity vs efficiency matrix 1

If you’ve ever found yourself caught in the whirlwind of tasks and deadlines, you’ve probably asked yourself: “How can I get more done?” or “How can I make better use of my time?” At the core of these questions lie two concepts that are often used interchangeably but are fundamentally different: Productivity and Efficiency. Understanding the nuances between productivity and efficiency can be a game-changer in both your personal and professional life. While both are geared towards improving performance and achieving goals, they focus on different aspects of the work process. Knowing when to prioritize one over the other can mean the difference between spinning your wheels and skyrocketing your …

Continue reading

Posted in Problem Solving. Tagged with .

OpenAI Python API Example for NLP Tasks

OpenAI Python API Example

Ever wondered how you can leverage the power of OpenAI’s GPT-3 and GPT-3.5 (from Jan 2024 onwards) directly in your Python application? Are you curious about generating human-like text with just a few lines of code? This blog post will walk you through an example Python code snippet that utilizes OpenAI’s Python API for different NLP tasks such as text generation. Check out my other post on how to use Langchain framework for text generation using OpenAI GPT models. OpenAI Python APIs The OpenAI Python API is an interface that allows you to interact with OpenAI’s language models, including their GPT-3 model. The following are different popular models that you …

Continue reading

Posted in Generative AI, Machine Learning, NLP, OpenAI, Python. Tagged with , , , .

Architecting a Generative AI Platform for GPT-based LLM Apps

Generative AI Platform Architecture for OpenAI GPT based LLM Apps

Have you ever wondered how to build a scalable Generative AI platform based on OpenAI GPT models that can serve different applications? Are you a data scientist, product manager, or software engineer looking to understand the intricacies of the architecture of such a scalable generative AI platform? This blog aims to demystify the architectural building blocks needed to create a robust GPT-based platform. By the end, you will have a clear roadmap for architecting, designing, and implementing your own GPT-based large language models (LLMs) applications platform. Generative AI Platform Architecture for GPT-based LLM Apps The following is the technology architecture of generative AI platform which can leverage OpenAI GPT based …

Continue reading

Posted in Generative AI, Machine Learning, OpenAI. Tagged with , , .

Microsoft’s Free Courses: Data Science, Machine Learning, AI

data science for beginners - free course by microsoft

Are you keen on diving into the world of data science, machine learning, or artificial intelligence? Have you been searching for courses that not only teach the fundamentals but are also free and accessible? Look no further! Microsoft has put together three distinct courses that will cater to your interests and ignite your passion for learning. Data Science for Beginners This course offers an ideal starting point for those new to data science, focusing on the basics and guiding through practical exercises. The course would help you demystify the complex world of data, allowing you to make informed decisions in various fields such as business, healthcare, and more. Each lesson …

Continue reading

Posted in AI, Career Planning, Data Science, Machine Learning, Online Courses. Tagged with , , , .

Text Clustering Python Examples: Steps, Algorithms

Text Clustering using K-Means Python Examples

Text clustering has swiftly emerged as a cornerstone in data-driven decision-making across industries. But what exactly is text clustering, and how can it transform the way businesses operate? How does it convert unstructured text into actionable insights? What are the core steps involved in text clustering, and how are they interlinked? What algorithms are pivotal in implementing text clustering effectively? In this blog, we will unravel these questions, diving deep into the systematic steps of text clustering, its underlying algorithms, and real-world examples that bring this technique to life. Whether you’re a product manager seeking to leverage data analytics or a data scientist curious to learn key steps of text …

Continue reading

Posted in Machine Learning, NLP. Tagged with , .

Topic Modeling LDA Python Example

topic modeling using LDA

Are you overwhelmed by the endless streams of text data and looking for a way to unearth the hidden themes that lie within? Have you ever wondered how platforms like Google News manage to group similar articles together, or how businesses extract insights from vast volumes of customer reviews? The answer to these questions might be simpler than you think, and it’s rooted in the world of Topic Modeling. Introducing Latent Dirichlet Allocation (LDA) – a powerful algorithm that offers a solution to the puzzle of understanding large text corpora. LDA is not just a buzzword in the data science community; it’s a mathematical tool that has found applications in …

Continue reading

Posted in Machine Learning, NLP. Tagged with , .

Encoder Only Transformer Models Quiz / Q&A

interview questions

Are you intrigued by the revolutionary world of transformer architectures? Have you ever wondered how encoder-only transformer models like BERT, ELECTRA, or DeBERTa have reshaped the landscape of Natural Language Processing (NLP)? The rapid advancement of machine learning has led to the creation of numerous transformer architectures, each with unique features, applications, and underlying mechanics. Whether you’re a data scientist, machine learning engineer, generative AI enthusiast, or a student eager to deepen your understanding, this quiz offers an engaging and informative way to assess your knowledge and sharpen your skills. It would also help you prepare for your interviews on this topic. Encoder-only transformer models have become a cornerstone in …

Continue reading

Posted in Deep Learning, Generative AI, Interview questions, Machine Learning, NLP, Quiz. Tagged with , , , , .