Pandas: Creating Multiindex Dataframe from Product or Tuples

Create multiindex from product

MultiIndex is a powerful tool that enables us to work with higher dimensional data, but it can be tricky to create MultiIndex Dataframes using the from_tuples and from_product function in Pandas. In this blog post, we will be discussing how to create a MultiIndex dataframe using MultiIndex from_tuples and from_product function in Pandas. 

What is a MultiIndex?

MultiIndex is an advanced Pandas function that allows users to create MultiIndexed DataFrames – i.e., dataframes with multiple levels of indexing. MultiIndex can be useful when you have data that can be naturally grouped by more than one category. For example, you might have data on individual employees that can be grouped by both their department and their job title. MultiIndex makes it easy to work with this type of data by allowing you to index the data using both levels. MultiIndex is created using the from_tuples() or from_product() function. Tuples are efficient structures for storing data, and tuples can be used to create a MultiIndex. The tuples function of MultiIndex, from_tuples can be used to create multi-index from multiple columns of data. Similarly, the product function (from_product) can also be used to create Multiindex. The following is a sample of MultiIndex data in a sample spreadsheet. The red border represents Multiindex data.

Sample Multiindex Data

Create MultiIndex Dataframe using Tuples function

We’ll start by creating labels which will then be followed by creating a list of tuples using list & zip method. Each tuple will represent index in each row in our Dataframe. Then, we’ll use the from_tuples function to create our MultiIndex. The from_tuples method returns an instance of “index”. The index object, data and column names are then used to create the dataframe.

import pandas as pd
#
# Create iterables
#
labels = [
    ["Aiyana", "Aiyana", "Anisha", "Anisha"],
    ["Mathematics", "Science", "Mathematics", "Science"]
]
#
# Create index using MultiIndex from_tuples method
#
tuples = list(zip(*labels))
index = pd.MultiIndex.from_tuples(tuples, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 95, 99],
    [95, 93, 96], 
    [92, 99, 95],
    [99, 95, 97]
], index=index, columns=["1st term", "2nd term", "Final"])
#
# Print dataframe
#
df.head()

Create MultiIndex Dataframe using Product function

We can create MultiIndex Dataframes using the from_product function in Pandas. This function takes an iterable of iterables, and creates a MultiIndex Dataframe from it. The resulting Dataframe will have a MultiIndex with one level for each iterable in the input. For example, if we have a list of products and a list of countries, we can use from_product to create a MultiIndex Dataframe with products and countries as the levels. This can be very useful when working with data that contains multiple levels of information. From_product is just one of the many ways to create MultiIndex Dataframes; however, it is often the simplest and most convenient method. The code below would create a MultiIndex with two levels, where the first level is [‘A’, ‘B’] and the second level is [‘D’, ‘E’]. You can then use this MultiIndex to index your dataframe. “pd” is an instance of Pandas.

pd.MultiIndex.from_product([(‘A’, ‘B’), (‘D’, ‘E’)])
 
The code given below is a detailed representation of how you create a MultiIndex Dataframe using from_product method.
import pandas as pd
#
# Create iterables
#
iterables = [["Aiyana", "Ajitesh", "Sumit", "Saanvi"], ["Mathematics", "Science", "English"]]
#
# Create index using MultiIndex from_product method
#
index = pd.MultiIndex.from_product(iterables, names=["Students", "Subjects"])
#
# Create dataframe
#
df = pd.DataFrame([
    [98, 96], [96, 97], [85, 89], 
    [92, 95], [99, 94], [87, 93], 
    [91, 89], [90, 93], [82, 87],
    [97, 90], [95, 91], [84, 91] 
], index=index, columns=["1st Term", "2nd Term"])
#
# Print dataframe
#
df

Conclusion

Creating a MultiIndex using from_tuples or from_product method in Pandas is a helpful way to store data more efficiently. This method is easy to implement and well suited for handling multiple index values for a single row of data. Thanks for reading!

Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com
Posted in Data Science, Python. Tagged with , .

Leave a Reply

Your email address will not be published. Required fields are marked *