Pandas – Fillna method for replacing missing values

0

In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models.

Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in three different subjects scored by different students in a class. Note the missing value in each subject for each of the students.

df = pd.DataFrame([['Ajitesh', 'M', 95, 89, 84],
                  ['Sanjiv', 'M', 90,None,87],
                  ['Rita', 'F',99,78,None],
                  ['Sangeeta', 'F',None,82,71],
                  ['Raju','M',None,75,81],
                  ['Srinivas','M',90,None,76]])

df.columns = ['name', 'gender', 'mathematics', 'science', 'english']

df.head(6)

Fillna method for Replacing with Mean Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mean value. The fillna method fills missing value of all numerical feature columns with mean values. The mean of 93.5, 81.0 and 79.8 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.mean())
Use fillna method to replace missing values with mean values
Fig 2. Replace missing values with mean values

Fillna method for Replacing with Median Value

Here is the code which fills the missing values, using fillna method, in different feature columns with median value. As like mean value, fillna method fills missing value of all numerical feature columns with median values. The median of 92.5, 80.0 and 81.0 is set in three different feature columns such as mathematics, science and english respectively.

df.fillna(df.median())
Use fillna method to replace missing values with median values
Fig 3. Replace missing values with median values

Fillna method for Replacing with Mode Value

Here is the code which fills the missing values, using fillna method, in different feature columns with mode value. For mode value, unlike mean and median values, you will need to use fillna method for individual columns separately. The mode of 90.0 is set in for mathematics column separately. This would need to be done for science and english column as well.

Use fillna method to replace missing values with mode values
Fig 4. Replace missing values with mode values
Ajitesh Kumar
Follow me
Share.

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.