Tag Archives: python

Statistics – Random Variables, Types & Python Examples

probability-distribution-plot-of-discrete-random-variable

Random variables are one of the most important concepts in statistics. In this blog post, we will discuss what they are, their different types, and how they are related to the probability distribution. We will also provide examples so that you can better understand this concept. As a data scientist, it is of utmost importance that you have a strong understanding of random variables and how to work with them. What is a random variable and what are some examples? A random variable is a variable that can take on random values. The key difference between a variable and a random variable is that the value of the random variable …

Continue reading

Posted in Data Science, Python, statistics. Tagged with , , .

How to Create Pandas Dataframe from Numpy Array

Scatterplot of Datafrae columns

Pandas is a library for data analysis in Python. It offers a wide range of features, including working with missing data, handling time series data, and reading and writing data in different formats. Pandas also provides an efficient way to manipulate and calculate data. One of its key features is the Pandas DataFrame, which is a two-dimensional array with labeled rows and columns. A DataFrame is a table-like structure that contains columns and rows of data. Creating a Pandas DataFrame from a NumPy array is simple. In this post, you will get a code sample for creating a Pandas Dataframe using a Numpy array with Python programming. Step 1: Load …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , .

Learning Curves Python Sklearn Example

Learning curve explained with python example

In this post, you will learn about how to use learning curves using Python code (Sklearn) example to determine machine learning model bias-variance. Knowing how to use learning curves will help you assess/diagnose whether the model is suffering from high bias (underfitting) or high variance (overfitting) and whether increasing training data samples could help solve the bias or variance problem. You may want to check some of the following posts in order to get a better understanding of bias-variance and underfitting-overfitting. Bias-variance concepts and interview questions Overfitting/Underfitting concepts and interview questions What are learning curves & why they are important? Learning curve in machine learning is used to assess how models will …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Machine Learning Sklearn Pipeline – Python Example

Machine-learning-pipeline-Sklearn

In this post, you will learning about concepts about machine learning (ML) pipeline and how to build ML pipeline using Python Sklearn Pipeline (sklearn.pipeline) package. Getting to know how to use Sklearn.pipeline effectively for training/testing machine learning models will help automate various different activities such as feature scaling, feature selection / extraction and training/testing the models. It is recommended for data scientists (Python) to get a good understanding of Sklearn.pipeline.  Introduction to Machine Learning Pipeline & Sklearn.pipeline Machine Learning (ML) pipeline, theoretically, represents different steps including data transformation and prediction through which data passes. The outcome of the pipeline is the trained model which can be used for making the predictions. …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Sample Dataset for Regression & Classification: Python

Sample-data-set-plot-for-regression

A lot of beginners in the field of data science / machine learning are intimidated by the prospect of doing data analysis and building regression (linear) & classification models in Python. But with an ability to create sample dataset using Python packages, you can practice your skills and build your confidence over a period of time. The technique demonstrated in this blog post to create and visualize / plot the sample dataset includes datasets that can be used for regression models such as linear regression and classification models such as logistic regression, random forest, SVM etc. You can use this technique to explore different methods for solving the same problem. …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

PCA Explained Variance Concepts with Python Example

In this post, you will learn about the concepts of explained variance which is one of the key concepts related to principal component analysis (PCA). The explained variance concepts will be illustrated with Python code examples. Check out the concepts of Eigenvalues and Eigenvectors in this post – Why & when to use Eigenvalue and Eigenvectors. What is Explained Variance? Explained variance is a statistical measure of how much variation in a dataset can be attributed to each of the principal components (eigenvectors) generated by the principal component analysis (PCA) method. In very basic terms, it refers to the amount of variability in a data set that can be attributed to …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

One-hot Encoding Concepts & Python Examples

One-hot encoding concepts and python examples

In this post, you will learn about One-hot Encoding concepts and code examples using Python programming language. One-hot encoding is also called as dummy encoding. In this post, OneHotEncoder class of sklearn.preprocessing will be used in the code examples. As a data scientist or machine learning engineer, you must learn the one-hot encoding techniques as it comes very handy while training machine learning models. What is One-Hot Encoding? One-hot encoding is a process whereby categorical variables are converted into a form that can be provided as an input to machine learning models. It is an essential preprocessing step for many machine learning tasks. The goal of one-hot encoding is to …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Feature Importance & Random Forest – Python

Random forest for feature importance

In this post, you will learn about how to use Random Forest Classifier (RandomForestClassifier) for determining feature importance using Sklearn Python code example. This will be useful in feature selection by finding most important features when solving classification machine learning problem. It is very important to understand feature importance and feature selection techniques for data scientists to use most appropriate features for training machine learning models. Recall that other feature selection techniques includes L-norm regularization techniques, greedy search algorithms techniques such as sequential backward / sequential forward selection etc.  What & Why of Feature Importance? Feature importance is a key concept in machine learning that refers to the relative importance of each feature …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Sklearn SimpleImputer Example – Impute Missing Data

In this post, you will learn about how to use Python’s Sklearn SimpleImputer for imputing / replacing numerical & categorical missing data using different strategies. In one of the related article posted sometime back, the usage of fillna method of Pandas DataFrame is discussed. Handling missing values is key part of data preprocessing and hence, it is of utmost importance for data scientists / machine learning Engineers to learn different techniques in relation imputing / replacing numerical or categorical missing values with appropriate value based on appropriate strategies. SimpleImputer Python Code Example SimpleImputer is a class in the sklearn.impute module that can be used to replace missing values in a dataset, using a …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Pandas dropna: Drop Rows & Columns with Missing Values

pandas dropna method code sample

In this blog post, we will be discussing Pandas’ dropna method. This method is used for dropping rows and columns that have missing values. Pandas is a powerful data analysis library for Python, and the dropna function is one of its most useful features. As data scientists, it is important to be able to handle missing data, and Pandas’ dropna function makes this easy. Pandas dropna Method Pandas’ dropna function allows us to drop rows or columns with missing values in our dataframe. Find the documentation of Pandas dropna method on this page: pandas.DataFrame.dropna. The dropna method looks like the following: DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False) Given the above method and parameters, the following …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Accuracy, Precision, Recall & F1-Score – Python Examples

Classification models are used in classification problems to predict the target class of the data sample. The classification model predicts the probability that each instance belongs to one class or another. It is important to evaluate the performance of the classifications model in order to reliably use these models in production for solving real-world problems. Performance measures in machine learning classification models are used to assess how well machine learning classification models perform in a given context. These performance metrics include accuracy, precision, recall, and F1-score. Because it helps us understand the strengths and limitations of these models when making predictions in new situations, model performance is essential for machine …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Support Vector Machine (SVM) Python Example

Support vector machine maximize the margin 2

In this post, you will learn about the concepts of Support Vector Machine (SVM)  with the help of  Python code example for building a machine learning classification model. We will work with Python Sklearn package for building the model. As data scientists, it is important to get a good grasp on SVM algorithm and related aspects. What is Support Vector Machine (SVM)? Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression tasks. At times, SVM for classification is termed as support vector classification (SVC) and SVM for regression is termed as support vector regression (SVR). In this post, we will learn about SVM …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

Logistic Regression Explained with Python Example

logistic regression model 3

In this blog post, we will discuss the logistic regression machine learning algorithm with a python example. Logistic regression is a type of regression algorithm that is used to predict the probability of occurrence of an event. It is often used in machine learning applications. In this tutorial, we will use python to implement logistic regression for binary classification problems.  What is Logistic Regression? Logistic regression is a machine learning algorithm used for classification problems. That is, it can be used to predict whether an instance belongs to one class or the other. For example, it could be used to predict whether a person is male or female, based on …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Perceptron Explained using Python Example

In this post, you will learn about the concepts of Perceptron with the help of Python example. It is very important for data scientists to understand the concepts related to Perceptron as a good understanding lays the foundation of learning advanced concepts of neural networks including deep neural networks (deep learning).  What is Perceptron? Perceptron is a machine learning algorithm which mimics how a neuron in the brain works. It is also called as single layer neural network consisting of a single neuron. The output of this neural network is decided based on the outcome of just one activation function associated with the single neuron. In perceptron, the forward propagation of information happens. Deep …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning, Python. Tagged with , , , .

Python – Creating Scatter Plot with IRIS Dataset

scatter-plot-with-IRIS-dataset-using-Python

In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we have plotted and explored how how Petal Length and Sepal Length …

Continue reading

Posted in Data Science, Python. Tagged with , , .

Tensor Explained with Python Numpy Examples

Tensors are a hot topic in the world of data science and machine learning. But what are tensors, and why are they so important? In this post, we will explain the concepts of Tensor using Python Numpy examples with the help of simple explanation. We will also discuss some of the ways that tensors can be used in data science and machine learning. When starting to learn deep learning, you must get a good understanding of the data structure namely tensor as it is used widely as the basic data structure in frameworks such as tensorflow, PyTorch, Keras etc. Stay tuned for more information on tensors! What are tensors, and why are …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning, Python. Tagged with , , .