Data Science

Pandas – Fillna method for replacing missing values

In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill.

Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.

df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
                  ['Sanjiv', 'M', 90,None,87],
                  ['Rita', 'F',99,78,None],
                  ['Sangeeta', 'F',None,82,71],
                  ['Raju','M',None,75,81],
                  ['Srinivas','M',90,None,76]])

df.columns = ['name', 'gender', 'mathematics', 'science', 'english']

df.head(6)

Fillna method for Replacing with Mean Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.mean())
Fig 2. Replace missing values with mean values

Fillna method for Replacing with Median Value

Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.median())
Fig 3. Replace missing values with median values

Fillna method for Replacing with Mode Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.

Fig 4. Replace missing values with mode values

Fillna method for Replacing with ffill

There is a parameter namely method in the fillna method which can be passed value such as ffill. This will result in filling missing values with the last observed value in row or column. If the axis = 0, the value in previous row in the same column is filled in place of missing value. If it is the first row being considered, nothing is done. If the axis = 1, the value in previous column in the same row is filled in place of missing value. If it is the first column being considered, nothing is done. Here is the code sample:

Fig 5. Replace missing values with method=’ffill’

Fillna method for Replacing with bfill

If the value for method parameter in the fillna method is assigned as bfill, this will result in filling missing values with the next observed value in row or column. If the axis = 0, the value in next row in the same column is filled in place of missing value. If it is the last row being considered, nothing is done. If the axis = 1, the value in next column in the same row is filled in place of missing value. If it is the last column being considered, nothing is done.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

2 months ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

3 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

3 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

3 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

3 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

3 months ago