In this blog post, we will be discussing Pandas’ dropna method. This method is used for dropping rows and columns that have missing values. Pandas is a powerful data analysis library for Python, and the dropna function is one of its most useful features. As data scientists, it is important to be able to handle missing data, and Pandas’ dropna function makes this easy.
Pandas’ dropna function allows us to drop rows or columns with missing values in our dataframe. Find the documentation of Pandas dropna method on this page: pandas.DataFrame.dropna. The dropna method looks like the following:
DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)
Given the above method and parameters, the following are some common scenarios in which dropna method can be used:
Here is the Python code for some of the dropna scenarios that we discussed above. Before getting into the examples, lets create a dataframe which can be used for working with dropna method. The code below represents different way in which Pandas dataframe can be created.
import pandas as pd
df = pd.DataFrame([['Ajitesh', 'M', 95,None, 84],
['Sanjiv', 'M', 90,87,87],
['Rita', 'F',99,78,None],
['Sangeeta', 'F',None,82,None],
['Raju','M',None,75,81],
['Srinivas',None,90,100,76]])
df.columns = ['name', 'gender', 'mathematics', 'science', 'english']
#Another way to form above dataframe is the following:
data = {'name':['Ajitesh', 'Sanjiv', 'Rita', 'Sangeeta', 'Raju', 'Srinivas'],
'gender':['M','M','F','F','M',None],
'mathematics':[95, 90, 99,None,None,90],
'science':[None,87,78,82,75, 100],
'english':[84, 87, None, None, 81, 76]}
df2 = pd.DataFrame(data)
Now that the above dataframe is created, lets look at different scenarios listed in the previous section.
Drop rows that contain missing values: Note that the value of axis is set to 0
df.dropna(axis=0)
Drop columns that contain missing values: Note that the value of axis is set to 1
df.dropna(axis=1)
Drop rows where all columns have missing values: Note that the value of how attribute is set to ‘all’. The default value of ‘how’ attribute is ‘any’.
df.dropna(how='all')
Drop rows where one or more missing values in any specific columns: Note that the value of subset is set to ‘english’ which means that remove all the rows which has NaN value in ‘english’ column.
df.dropna(subset=['english'])
Drop rows that have fewer than n real values: Note that setting the value of threshold (thresh) to 4 removes the column with label ‘Sangeeta’ as this is row having fewer than 4 real values.
df.dropna(thresh=4)
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…