Data Science

Pandas: Creating Multiindex Dataframe from Product or Tuples

MultiIndex is a powerful tool that enables us to work with higher dimensional data, but it can be tricky to create MultiIndex Dataframes using the from_tuples and from_product function in Pandas. In this blog post, we will be discussing how to create a MultiIndex dataframe using MultiIndex from_tuples and from_product function in Pandas. 

What is a MultiIndex?

MultiIndex is an advanced Pandas function that allows users to create MultiIndexed DataFrames – i.e., dataframes with multiple levels of indexing. MultiIndex can be useful when you have data that can be naturally grouped by more than one category. For example, you might have data on individual employees that can be grouped by both their department and their job title. MultiIndex makes it easy to work with this type of data by allowing you to index the data using both levels. MultiIndex is created using the from_tuples() or from_product() function. Tuples are efficient structures for storing data, and tuples can be used to create a MultiIndex. The tuples function of MultiIndex, from_tuples can be used to create multi-index from multiple columns of data. Similarly, the product function (from_product) can also be used to create Multiindex. The following is a sample of MultiIndex data in a sample spreadsheet. The red border represents Multiindex data.

Create MultiIndex Dataframe using Tuples function

We’ll start by creating labels which will then be followed by creating a list of tuples using list & zip method. Each tuple will represent index in each row in our Dataframe. Then, we’ll use the from_tuples function to create our MultiIndex. The from_tuples method returns an instance of “index”. The index object, data and column names are then used to create the dataframe.

import pandas as pd
#
# Create iterables
#
labels = [
    ["Aiyana", "Aiyana", "Anisha", "Anisha"],
    ["Mathematics", "Science", "Mathematics", "Science"]
]
#
# Create index using MultiIndex from_tuples method
#
tuples = list(zip(*labels))
index = pd.MultiIndex.from_tuples(tuples, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 95, 99],
    [95, 93, 96], 
    [92, 99, 95],
    [99, 95, 97]
], index=index, columns=["1st term", "2nd term", "Final"])
#
# Print dataframe
#
df.head()

Create MultiIndex Dataframe using Product function

We can create MultiIndex Dataframes using the from_product function in Pandas. This function takes an iterable of iterables, and creates a MultiIndex Dataframe from it. The resulting Dataframe will have a MultiIndex with one level for each iterable in the input. For example, if we have a list of products and a list of countries, we can use from_product to create a MultiIndex Dataframe with products and countries as the levels. This can be very useful when working with data that contains multiple levels of information. From_product is just one of the many ways to create MultiIndex Dataframes; however, it is often the simplest and most convenient method. The code below would create a MultiIndex with two levels, where the first level is [‘A’, ‘B’] and the second level is [‘D’, ‘E’]. You can then use this MultiIndex to index your dataframe. “pd” is an instance of Pandas.

pd.MultiIndex.from_product([(‘A’, ‘B’), (‘D’, ‘E’)])
The code given below is a detailed representation of how you create a MultiIndex Dataframe using from_product method.
import pandas as pd
#
# Create iterables
#
iterables = [["Aiyana", "Ajitesh", "Sumit", "Saanvi"], ["Mathematics", "Science", "English"]]
#
# Create index using MultiIndex from_product method
#
index = pd.MultiIndex.from_product(iterables, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 96], [96, 97], [85, 89], 
    [92, 95], [99, 94], [87, 93], 
    [91, 89], [90, 93], [82, 87],
    [97, 90], [95, 91], [84, 91] 
], index=index, columns=["1st Term", "2nd Term"])
#
# Print dataframe
#
df

Conclusion

Creating a MultiIndex using from_tuples or from_product method in Pandas is a helpful way to store data more efficiently. This method is easy to implement and well suited for handling multiple index values for a single row of data. Thanks for reading!

Latest posts by Ajitesh Kumar (see all)
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

What are AI Agents? How do they work?

Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…

2 weeks ago

Agentic AI Design Patterns Examples

In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…

2 weeks ago

List of Agentic AI Resources, Papers, Courses

In this blog, I aim to provide a comprehensive list of valuable resources for learning…

2 weeks ago

Understanding FAR, FRR, and EER in Auth Systems

Have you ever wondered how systems determine whether to grant or deny access, and how…

3 weeks ago

Top 10 Gartner Technology Trends for 2025

What revolutionary technologies and industries will define the future of business in 2025? As we…

3 weeks ago

OpenAI GPT Models in 2024: What’s in it for Data Scientists

For data scientists and machine learning researchers, 2024 has been a landmark year in AI…

3 weeks ago