Category Archives: Python

Spearman Correlation Coefficient: Formula, Examples

spearman-rank-correlation-coefficient-visualization

Have you ever wondered how you might determine the relationship between two sets of data that aren’t necessarily linear, or perhaps don’t adhere to the assumptions of other correlation measures? Enter the Spearman Rank Correlation Coefficient, a non-parametric statistic that offers robust insights into the monotonic relationship between two variables – perfect for dealing with ranked variables or exploring potential relationships in a new, exploratory dataset. In this blog post, we will learn the concepts of Spearman correlation coefficient with the help of Python code examples. Understanding the concept can prove to be very helpful for data scientists. Whether you’re exploring associations in marketing data, results from a customer satisfaction …

Continue reading

Posted in Data Science, Python, statistics. Tagged with , , .

Heteroskedasticity in Regression Models: Examples

heteroskedasticity-regression-models-examples

Have you ever encountered data that exhibits varying patterns of dispersion and wondered how it might impact your regression models? The varying patterns of dispersion represents the essence of heteroskedasticity – the phenomenon where the spread or variability of the residuals / errors in a regression model changes across different levels or values of the independent variables. As data scientists, understanding the concept of heteroskedasticity is crucial for robust and accurate analyses. In this blog, we delve into the intriguing world of heteroskedasticity in regression models and explore its implications through real-world examples. What’s heteroskedasticity and why learn this concept? Heteroskedasticity refers to a statistical phenomenon observed in regression analysis, …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Matplotlib Bar Chart Python / Pandas Examples

bar-chart-using-matplotlib-pandas-and-python-3

Are you looking to learn how to create bar charts / bar plots / bar graph using the combination of Matplotlib and Pandas in Python? Bar charts are one of the most commonly used visualizations in data analysis, enabling us to present categorical data in a visually appealing and intuitive manner. Whether you’re a beginner data scientist or an intermediate-level practitioner seeking to enhance your visualization skills, this blog will provide you with practical examples and hands-on guidance to create compelling bar charts / bar plots using Matplotlib libraries in Python. You will also learn how to leverage the data manipulation capabilities of Pandas to prepare the data for visualization, …

Continue reading

Posted in Data Science, Python. Tagged with , .

One-hot Encoding Concepts & Python Examples

One-hot encoding concepts and python examples

Have you ever encountered categorical variables in your data analysis or machine learning projects? These variables represent discrete qualities or characteristics, such as colors, genders, or types of products. While numerical variables can be directly used as inputs for machine learning algorithms, categorical variables require a different approach. One common technique used to convert categorical variables into a numerical representation is called one-hot encoding, also known as dummy encoding. When working with machine learning algorithms, categorical variables need to be transformed into a numerical representation to be effectively used as inputs. This is where one-hot encoding comes to rescue. In this post, you will learn about One-hot Encoding concepts and …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

What & When: List, Tuple & Set in Python – Examples

List, Tuple and Set in Python - When to use

When working with Python programming, data structures play a crucial role in organizing and manipulating data efficiently. Among several data structures available, lists, tuples, and sets are three fundamental ones that every Python programmer/developer should understand. Lists, tuples, and sets are unique in terms of their properties and functionality, making them most appropriate for different scenarios. Not only are these data structures most frequently used in everyday programming tasks, but they are also frequently asked about in interviews with data analysts and data scientists. Therefore, grasping the concepts of lists, tuples, and sets becomes essential. In this blog, we will delve deeper into the specifics of lists, tuples, and sets, …

Continue reading

Posted in Python. Tagged with .

Ridge Regression Concepts & Python example

Ridge regression cost function 2

Ridge regression is a type of linear regression that penalizes ridge coefficients. This technique can be used to reduce the effects of multicollinearity in ridge regression, which may result from high correlations among predictors or between predictors and independent variables. In this tutorial, we will explain ridge regression with a Python example. What is Ridge Regression? Ridge regression is a powerful technique in machine learning that addresses the issue of overfitting in linear models. In linear regression, we aim to model the relationship between a response variable and one or more predictor variables. However, when there are multiple variables that are highly correlated, the model can become too complex and …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Pandas CSV to Dataframe Python Example

Read CSV Files to Pandas Dataframe using Python

Converting CSV files to DataFrames is a common task in data analysis. In this blog, we’ll explore a Python code example using the Pandas library to efficiently convert CSV files to DataFrames. This approach offers flexibility, speed, and convenience, making it a valuable technique for handling large datasets. Read CSV into Pandas Dataframe The following is the code which can be used to read the CSV file from local drive: In case, you want to read CSV file from the URL, the following will be the code. As a matter of fact, nothing changes except for the fact that you pass the URL to read_csv function. The following are some …

Continue reading

Posted in Data Science, Python. Tagged with , .

Outlier Detection Techniques in Python: Examples

Outlier detection Python Machine Learning

In the realm of data science, mastering outlier detection techniques is paramount for ensuring data integrity and robust machine learning model performance. Outliers are the data points which deviate significantly from the norm. The outliers data points can greatly impact the accuracy and reliability of statistical analyses and machine learning models. In this blog, we will explore a variety of outlier detection techniques using Python. The methods covered will include statistical approaches like the z-score method and the interquartile range (IQR) method, as well as visualization techniques like box plots and scatter plots. Whether you are a data science enthusiast or a seasoned professional, it is important to grasp these …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Python Tesseract PDF & OCR Example

python tesseract pdf ocr example

Have you ever needed to extract text from an image or a PDF file? If so, you’re in luck! Python has an amazing library called Tesseract that can perform Optical Character Recognition (OCR) to extract text from images and PDFs. In this blog, I will share sample Python code using with you can use Tesseract to extract text from images and PDFs. As a data scientist, it can be very helpful and useful to be able to extract text from images or PDFs, especially when working with large amounts of data found in receipts, invoices, etc. Tesseract is an OCR engine widely used in the industry, known for its accuracy …

Continue reading

Posted in Data Science, Python. Tagged with , .

Python: Convert JSON to CSV Example

Convert JSON to CSV using Python Code

Have you ever wondered how to convert JSON data to CSV using Python? JSON (JavaScript Object Notation) is a popular data format used to exchange data between servers and web applications. However, sometimes it’s necessary to convert this data into another format, such as CSV (Comma Separated Values). CSV is a simple text format that is commonly used to store and exchange tabular data. In this blog post, a sample Python code is provided for converting JSON to CSV using Python. The code showcases the Python code that uses the json and csv modules to read and write data. But before going forward with the code, let’s take a look …

Continue reading

Posted in Python.

Seaborn: Multiple Line Plots with Markers, Legend

Seaborn multiple line plots using markers, legends

Do you want to learn how to create visually stunning and informative line plots that will captivate your audience by providing most apt information? Do you have the requirement of creating multiple line plots in the same figure representing sales of different products across different months in a year? Are you looking for a takeaway Python code with Seaborn library for creating line plots? If yes, you are in the right place. In this blog post, we’ll explore how to create multiple line plots with Seaborn, a powerful data visualization library built on top of Matplotlib. I will also show how to add markers to the line plots to make …

Continue reading

Posted in Data Science, Data Visualization, Python.

Sklearn Algorithms Cheat Sheet with Examples

sklearn algorithms cheat sheet

The Sklearn library, short for Scikit-learn, is one of the most popular and widely-used libraries for machine learning in Python. It offers a comprehensive set of tools for data analysis, preprocessing, model selection, and evaluation. As a beginner data scientist, it can be overwhelming to navigate the various algorithms and functions within Sklearn. This is where the Sklearn Algorithms Cheat Sheet comes in handy. This cheat sheet provides a quick reference guide for beginners to easily understand and select the appropriate algorithm for their specific task. In this cheat sheet, I have compiled a list of common supervised and unsupervised learning algorithms, along with their Sklearn classes and example use …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Sklearn Neural Network Example – MLPRegressor

Sklearn Neural Network MLPRegressor Regression Model

Are you interested in using neural networks to solve complex regression problems, but not sure where to start? Sklearn’s MLPRegressor can help you get started with building neural network models for regression tasks. While the packages from Keras, Tensorflow or PyTorch are powerful and widely used in deep learning, Sklearn’s MLPRegressor is still an excellent choice for building neural network models for regression tasks when you are starting on. Recall that Python Sklearn library is one of the most popular machine learning libraries, and it provides a wide range of algorithms for classification, regression, clustering, dimensionality reduction, and more. In this blog post, we will be focusing on training a …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning, Python. Tagged with , , .

KMeans Silhouette Score Python Example

If you’re building machine learning models for solving different prediction problems, you’ve probably heard of clustering. Clustering is a popular unsupervised learning technique used to group data points with similar features into distinct clusters. One of the most widely used clustering algorithms is KMeans, which is popular due to its simplicity and efficiency. However, one major challenge in clustering is determining the optimal number of clusters that should be used to group the data points. This is where the Silhouette Score comes into play, as it helps us measure the quality of clustering and determine the optimal number of clusters. Silhouette score helps us get further clarity for the following …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Python – Draw Confusion Matrix using Matplotlib

Classification models are a fundamental part of machine learning and are used extensively in various industries. Evaluating the performance of these models is critical in determining their effectiveness and identifying areas for improvement. One of the most common tools used for evaluating classification models is the confusion matrix. It provides a visual representation of the model’s performance by displaying the number of true positives, false positives, true negatives, and false negatives. In this post, we will explore how to create and visualize confusion matrices in Python using Matplotlib. We will walk through the process step-by-step and provide examples that demonstrate the use of Matplotlib in creating clear and concise confusion …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Support Vector Machine (SVM) Python Example

support vector machine - SVM

Support Vector Machines (SVMs) are a powerful and versatile machine learning algorithm that has gained widespread popularity among data scientists in recent years. SVMs are widely used for classification, regression, and outlier detection (one-class SVM), and have proven to be highly effective in solving complex problems in various fields, including computer vision (image classification, object detection, etc.), natural language processing (sentiment analysis, text classification, etc.), and bioinformatics (gene expression analysis, protein classification, disease diagnosis, etc.). In this post, you will learn about the concepts of Support Vector Machine (SVM)  with the help of  Python code example for building a machine learning classification model. We will work with Python Sklearn package for building the …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .