Fillna method for replacing missing values
In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill.
Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.
df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
['Sanjiv', 'M', 90,None,87],
['Rita', 'F',99,78,None],
['Sangeeta', 'F',None,82,71],
['Raju','M',None,75,81],
['Srinivas','M',90,None,76]])
df.columns = ['name', 'gender', 'mathematics', 'science', 'english']
df.head(6)
Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.mean())
Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.median())
Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.
There is a parameter namely method in the fillna method which can be passed value such as ffill. This will result in filling missing values with the last observed value in row or column. If the axis = 0, the value in previous row in the same column is filled in place of missing value. If it is the first row being considered, nothing is done. If the axis = 1, the value in previous column in the same row is filled in place of missing value. If it is the first column being considered, nothing is done. Here is the code sample:
If the value for method parameter in the fillna method is assigned as bfill, this will result in filling missing values with the next observed value in row or column. If the axis = 0, the value in next row in the same column is filled in place of missing value. If it is the last row being considered, nothing is done. If the axis = 1, the value in next column in the same row is filled in place of missing value. If it is the last column being considered, nothing is done.
Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…
Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…
Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…
Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…
The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…
Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…