Last updated: 12th Dec, 2023
Pandas is a popular data manipulation library in Python, widely used for data analysis and data science tasks. Pandas Dataframe is a two-dimensional labeled data structure with columns of potentially different types, similar to a spreadsheet or SQL table. One of the common tasks in data manipulation when working with Pandas package in Python is how to add new columns and rows to an existing and empty dataframe. It might seem like a trivial task, but choosing the right method to add a row to a dataframe as well as adding a column can significantly impact the performance and efficiency of your code.
In this blog, we will explore how to add data to the Pandas dataframe including adding new rows and columns. We will look into different methods available on Dataframe in Pandas, such as .loc, .insert, .concat, .append, and many more. While working on a data project using Python programming, there are several scenarios when you’ll need to add new rows and columns to your existing Dataframe. In this article, we will show you how to do it. As data scientists or data analysts, you must get a good understanding of how to add a row to an existing or empty Dataframe as well as add a column.
In this post, we will work with the following Pandas dataframe to learn about how to add new rows and columns to an existing dataframe.
import pandas as pd
df = pd.DataFrame({
"Mathematics": [95, 99],
"Science": [98, 94]
}, index=["Aiyana", "Anisha"])
df
There are multiple ways in which one can add rows to Pandas Dataframe. You can use Dataframe.loc or Dataframe.append method to add a row to the end of Dataframe. If you want to insert row into dataframe at any specific position, then you can use Dataframe.insert() method. Let’s see all these methods one by one with an example.
In this method, we will take the help of the Dataframe.loc method to add a row to a dataframe. Dataframe.loc is used to access a group of rows and columns by label(s). The loc
method of Pandas DataFrame allows users to select subsets of data from a DataFrame based on specific row and column labels. The loc
method stands for “location” and is used to filter data by specifying the row and column indices.
If you want to add row to Dataframe, you can use the .loc[] indexing method. The .loc[] method is used to access Dataframe elements by label, and it supports adding new rows as well as creating copies of existing ones. The following code represents how to add a list as a row to a Pandas dataframe at the end.
# Use loc method to add a new row with label
#
df.loc["Saanvi"] = [96, 90]
Adding a row to a Dataframe using .loc[index]: In the above Python code, the rows got added using label as index. You could also use loc method to add rows to dataframe which does not have labels defined as indices. Let’s explore two scenarios to understand how it works for empty and non-empty dataframe
Scenario 1: Adding a Row to a DataFrame (Empty)
When you have an empty DataFrame, you can add a row using the loc method by specifying the index and the column values. If the DataFrame has predefined columns, you need to ensure that the values you’re adding match these columns. In the following example, we will add a row to Dataframe one by one. To add rows to a DataFrame for five students with sample marks, you can create a list of lists where each inner list represents the marks of a student. Then, loop through this list to add each student’s marks to the DataFrame.
import pandas as pd # Create an empty DataFrame with predefined subject columns df = pd.DataFrame(columns=['Mathematics', 'Science', 'English']) # Sample marks for five students students_marks = [ [78, 82, 88], # Marks for student 1 [85, 79, 91], # Marks for student 2 [92, 87, 73], # Marks for student 3 [69, 74, 84], # Marks for student 4 [76, 88, 80] # Marks for student 5 ] # Loop to add each student's marks for i, marks in enumerate(students_marks): df.loc[i] = marks # Display the updated DataFrame print(df)
Scenario 1: Adding a Row to a DataFrame (Non-empty)
To add a new row to the end of a non-empty DataFrame in a generic way, you can use the DataFrame.loc method with an index value that is one greater than the current maximum index. This ensures that the new row is added at the end, regardless of the DataFrame’s current size.
import pandas as pd # Example non-empty DataFrame with initial student marks df = pd.DataFrame({'Mathematics': [88, 92], 'Science': [93, 85], 'English': [78, 80]}) # Determine the index for the new row # It's one more than the current maximum index new_index = df.index.max() + 1 # Add a new row with marks for another student at the end df.loc[new_index] = [75, 80, 70] # New student's marks # Display the updated DataFrame print(df)
In this method, we will take the help of Dataframe.append() method. Dataframe.append() is used to append rows of other Dataframes to the end of this Dataframe, returning a new object. Rows are added at the bottom, so the index labels are increasing, and duplicate index values are not preserved. The code below represents the same:
# Append one or more rows of another dataframe
#
df1 = pd.DataFrame({
"Mathematics": [92],
"Science": [95]
}, index=["Snehal"])
#
# Append a dataframe
#
df = df.append(df1)
Update [14 May, 2023]: While executing append method in Google Colab, I got the following alert:
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. df = df.append(df1)
The append method in Pandas is deprecated and will be removed in a future version. This is because the append method is not as efficient as the concat method, and it can also lead to unexpected results.
The concat method, shown in the next section, is rather recommended method to append DataFrames in Pandas. The concat method has several advantages over the append method. First, the concat method is more efficient. The concat method can append DataFrames in-place, which means that it does not need to create a new copy of the DataFrame. This can be a significant performance improvement for large DataFrames. Second, the concat method is more flexible. The concat method allows you to specify how the rows from the input DataFrames should be aligned. For example, you can specify that the rows should be aligned by index, or by column.
The concat method can be used to add rows to a Pandas Dataframe. The concat method takes an iterable of Series or Dataframe objects and concatenates them into a single Dataframe. The concat method can be used to combine two or more Dataframes into a single Dataframe, or to combine a Series and a Dataframe into a single Dataframe. The following code represents way to add one or more rows to the end of Dataframe.
import pandas as pd
#
# Create a dataframe
#
dict = {"Mathematics":[95, 90, 99],
"Science": [99, 95, 92]}
df1 = pd.DataFrame(dict, index=["Aiyana", "Anisha", "Saanvi"])
#
# Create another dataframe
#
df2 = pd.DataFrame({"Mathematics": [96],
"Science": [99]},
index=["Snehal"])
#
# Concat dataframes
#
pd.concat([df1, df2])
In case, you don’t want to add index (such as “Snehal” in the above example), here is the updated code. Make a note of the argument (ignore_index=True) passed to concat method.
import pandas as pd
#
# Create a dataframe
#
dict = {"Mathematics":[95, 90, 99],
"Science": [99, 95, 92]}
df1 = pd.DataFrame(dict, index=["Aiyana", "Anisha", "Saanvi"])
#
# Create another dataframe
#
df2 = pd.DataFrame({"Mathematics": [96],
"Science": [99]})
#
# Concat dataframes
#
pd.concat([df1, df2], ignore_index=True)
The choice between using .loc
, .append
, or .concat
to add rows to a Pandas dataframe depends on the specific use case and desired outcome.
No, Dataframe.insert() method in Pandas is specifically designed for inserting a column into a DataFrame, not a row. The method allows you to insert a new column at a specific column index with a given value or set of values. This is demonstrated in the next section.
No, you cannot directly use the Dataframe.iloc method to add a row to an existing DataFrame in Pandas. The iloc method is specifically designed for integer-location based indexing for selection by position. It allows you to select rows and columns by their integer index but does not provide functionality for adding new rows.
There are multiple ways of adding columns to Dataframe. You can use Dataframe.loc or bracket method to add a new column at the end of Dataframe. Let’s see all these methods one by one with an example.
If you want to add a single column to your Dataframe, you can use the .loc[] indexing method. The .loc[] method is used to access Dataframe elements by label, and it supports adding new columns as well as creating copies of existing ones. The following code represents how to add a Dataframe column at the end.
# Adding a new column using loc method
#
df.loc[:, ["English"]] = [85, 92, 79, 87]
In this method, we will take the help of using brackets on data frame object to insert a new column. The column is inserted at the end of all the columns. The following code represents how to add columns using brackets.
# Adding a new column using brackets
#
df["Hindi"] = [81, 79, 72, 76]
In this method, we will take the help of Dataframe.insert() method. Dataframe.insert() is used to insert a column in Dataframe at a specified location. The column is inserted at the given position among all columns. The following code represents how to add a Dataframe column using Dataframe.insert() method:
# Adding a column at a specified position using insert method
#
df.insert(1, "Social Science", [86, 78, 82, 80])
The following code demonstrates that you can also use concat method with axis=”columns” to add a new column to existing data frame. Here is the code:
import pandas as pd
#
# Create a new dataframe
#
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
#
# Define a new column dataframe
#
new_column_df = pd.DataFrame({'Salary': [5000, 6000, 7000]})
#
# Add new column using concat method
#
df = pd.concat([df, new_column_df], axis="columns")
df
The choice between using .loc, .insert, or adding at the end of columns to add one or more columns to a Pandas dataframe depends on the specific use case and desired outcome. Here are some guidelines to help you decide which method to use:
In conclusion, managing and manipulating data in Pandas Dataframes is a fundamental skill in data analysis and Python programming. We’ve explored various methods on how to add rows to Pandas dataframe as well as add columns, each with its unique approach and application. The .loc() method is versatile for adding rows, whether by label or index, while .append() and .concat() offer alternative ways for row insertion, with .append() being deprecated in favor of the more efficient .concat().
For adding columns, .loc() again proves useful, as does the straightforward approach of using brackets. The .insert() method offers precise control over column placement, and .concat() is ideal for merging dataframes. Remember, the choice of method depends on your specific data manipulation needs and the structure of your dataframe. Whether it’s adding single or multiple rows or columns, understanding these methods ensures you can effectively organize and present your data for analysis.
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…
View Comments
thanks for the explanation and congratulations for the clarity, I've read dozens of tutorials, but you're the only one who has dealt with the management of a dataframe taking into account that an index exists
Thank you Sabino for your feedback.