Data Science

Pandas – Fillna method for replacing missing values

In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill.

Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.

df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
                  ['Sanjiv', 'M', 90,None,87],
                  ['Rita', 'F',99,78,None],
                  ['Sangeeta', 'F',None,82,71],
                  ['Raju','M',None,75,81],
                  ['Srinivas','M',90,None,76]])

df.columns = ['name', 'gender', 'mathematics', 'science', 'english']

df.head(6)

Fillna method for Replacing with Mean Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.mean())
Fig 2. Replace missing values with mean values

Fillna method for Replacing with Median Value

Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.median())
Fig 3. Replace missing values with median values

Fillna method for Replacing with Mode Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.

Fig 4. Replace missing values with mode values

Fillna method for Replacing with ffill

There is a parameter namely method in the fillna method which can be passed value such as ffill. This will result in filling missing values with the last observed value in row or column. If the axis = 0, the value in previous row in the same column is filled in place of missing value. If it is the first row being considered, nothing is done. If the axis = 1, the value in previous column in the same row is filled in place of missing value. If it is the first column being considered, nothing is done. Here is the code sample:

Fig 5. Replace missing values with method=’ffill’

Fillna method for Replacing with bfill

If the value for method parameter in the fillna method is assigned as bfill, this will result in filling missing values with the next observed value in row or column. If the axis = 0, the value in next row in the same column is filled in place of missing value. If it is the last row being considered, nothing is done. If the axis = 1, the value in next column in the same row is filled in place of missing value. If it is the last column being considered, nothing is done.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Retrieval Augmented Generation (RAG) & LLM: Examples

Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…

2 months ago

How to Setup MEAN App with LangChain.js

Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…

2 months ago

Build AI Chatbots for SAAS Using LLMs, RAG, Multi-Agent Frameworks

Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…

2 months ago

Creating a RAG Application Using LangGraph: Example Code

Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…

2 months ago

Building a RAG Application with LangChain: Example Code

The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…

2 months ago

Building an OpenAI Chatbot with LangChain

Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…

2 months ago