In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill.
Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.
df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
['Sanjiv', 'M', 90,None,87],
['Rita', 'F',99,78,None],
['Sangeeta', 'F',None,82,71],
['Raju','M',None,75,81],
['Srinivas','M',90,None,76]])
df.columns = ['name', 'gender', 'mathematics', 'science', 'english']
df.head(6)
Fillna method for Replacing with Mean Value
Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.mean())
Fillna method for Replacing with Median Value
Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.
df.fillna(df.median())
Fillna method for Replacing with Mode Value
Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.
Fillna method for Replacing with ffill
There is a parameter namely method in the fillna method which can be passed value such as ffill. This will result in filling missing values with the last observed value in row or column. If the axis = 0, the value in previous row in the same column is filled in place of missing value. If it is the first row being considered, nothing is done. If the axis = 1, the value in previous column in the same row is filled in place of missing value. If it is the first column being considered, nothing is done. Here is the code sample:
Fillna method for Replacing with bfill
If the value for method parameter in the fillna method is assigned as bfill, this will result in filling missing values with the next observed value in row or column. If the axis = 0, the value in next row in the same column is filled in place of missing value. If it is the last row being considered, nothing is done. If the axis = 1, the value in next column in the same row is filled in place of missing value. If it is the last column being considered, nothing is done.
- Agentic Reasoning Design Patterns in AI: Examples - October 18, 2024
- LLMs for Adaptive Learning & Personalized Education - October 8, 2024
- Sparse Mixture of Experts (MoE) Models: Examples - October 6, 2024
I found it very helpful. However the differences are not too understandable for me