Data Science

Pandas: Creating Multiindex Dataframe from Product or Tuples

MultiIndex is a powerful tool that enables us to work with higher dimensional data, but it can be tricky to create MultiIndex Dataframes using the from_tuples and from_product function in Pandas. In this blog post, we will be discussing how to create a MultiIndex dataframe using MultiIndex from_tuples and from_product function in Pandas. 

What is a MultiIndex?

MultiIndex is an advanced Pandas function that allows users to create MultiIndexed DataFrames – i.e., dataframes with multiple levels of indexing. MultiIndex can be useful when you have data that can be naturally grouped by more than one category. For example, you might have data on individual employees that can be grouped by both their department and their job title. MultiIndex makes it easy to work with this type of data by allowing you to index the data using both levels. MultiIndex is created using the from_tuples() or from_product() function. Tuples are efficient structures for storing data, and tuples can be used to create a MultiIndex. The tuples function of MultiIndex, from_tuples can be used to create multi-index from multiple columns of data. Similarly, the product function (from_product) can also be used to create Multiindex. The following is a sample of MultiIndex data in a sample spreadsheet. The red border represents Multiindex data.

Create MultiIndex Dataframe using Tuples function

We’ll start by creating labels which will then be followed by creating a list of tuples using list & zip method. Each tuple will represent index in each row in our Dataframe. Then, we’ll use the from_tuples function to create our MultiIndex. The from_tuples method returns an instance of “index”. The index object, data and column names are then used to create the dataframe.

import pandas as pd
#
# Create iterables
#
labels = [
    ["Aiyana", "Aiyana", "Anisha", "Anisha"],
    ["Mathematics", "Science", "Mathematics", "Science"]
]
#
# Create index using MultiIndex from_tuples method
#
tuples = list(zip(*labels))
index = pd.MultiIndex.from_tuples(tuples, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 95, 99],
    [95, 93, 96], 
    [92, 99, 95],
    [99, 95, 97]
], index=index, columns=["1st term", "2nd term", "Final"])
#
# Print dataframe
#
df.head()

Create MultiIndex Dataframe using Product function

We can create MultiIndex Dataframes using the from_product function in Pandas. This function takes an iterable of iterables, and creates a MultiIndex Dataframe from it. The resulting Dataframe will have a MultiIndex with one level for each iterable in the input. For example, if we have a list of products and a list of countries, we can use from_product to create a MultiIndex Dataframe with products and countries as the levels. This can be very useful when working with data that contains multiple levels of information. From_product is just one of the many ways to create MultiIndex Dataframes; however, it is often the simplest and most convenient method. The code below would create a MultiIndex with two levels, where the first level is [‘A’, ‘B’] and the second level is [‘D’, ‘E’]. You can then use this MultiIndex to index your dataframe. “pd” is an instance of Pandas.

pd.MultiIndex.from_product([(‘A’, ‘B’), (‘D’, ‘E’)])
The code given below is a detailed representation of how you create a MultiIndex Dataframe using from_product method.
import pandas as pd
#
# Create iterables
#
iterables = [["Aiyana", "Ajitesh", "Sumit", "Saanvi"], ["Mathematics", "Science", "English"]]
#
# Create index using MultiIndex from_product method
#
index = pd.MultiIndex.from_product(iterables, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 96], [96, 97], [85, 89], 
    [92, 95], [99, 94], [87, 93], 
    [91, 89], [90, 93], [82, 87],
    [97, 90], [95, 91], [84, 91] 
], index=index, columns=["1st Term", "2nd Term"])
#
# Print dataframe
#
df

Conclusion

Creating a MultiIndex using from_tuples or from_product method in Pandas is a helpful way to store data more efficiently. This method is easy to implement and well suited for handling multiple index values for a single row of data. Thanks for reading!

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

2 months ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

2 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

3 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

3 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

3 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

3 months ago