In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill.
Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.
df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
['Sanjiv', 'M', 90,None,87],
['Rita', 'F',99,78,None],
['Sangeeta', 'F',None,82,71],
['Raju','M',None,75,81],
['Srinivas','M',90,None,76]])
df.columns = ['name', 'gender', 'mathematics', 'science', 'english']
df.head(6)
Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.mean())
Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.median())
Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.
There is a parameter namely method in the fillna method which can be passed value such as ffill. This will result in filling missing values with the last observed value in row or column. If the axis = 0, the value in previous row in the same column is filled in place of missing value. If it is the first row being considered, nothing is done. If the axis = 1, the value in previous column in the same row is filled in place of missing value. If it is the first column being considered, nothing is done. Here is the code sample:
If the value for method parameter in the fillna method is assigned as bfill, this will result in filling missing values with the next observed value in row or column. If the axis = 0, the value in next row in the same column is filled in place of missing value. If it is the last row being considered, nothing is done. If the axis = 1, the value in next column in the same row is filled in place of missing value. If it is the last column being considered, nothing is done.
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…