MultiIndex is a powerful tool that enables us to work with higher dimensional data, but it can be tricky to create MultiIndex Dataframes using the from_tuples and from_product function in Pandas. In this blog post, we will be discussing how to create a MultiIndex dataframe using MultiIndex from_tuples and from_product function in Pandas.
What is a MultiIndex?
MultiIndex is an advanced Pandas function that allows users to create MultiIndexed DataFrames – i.e., dataframes with multiple levels of indexing. MultiIndex can be useful when you have data that can be naturally grouped by more than one category. For example, you might have data on individual employees that can be grouped by both their department and their job title. MultiIndex makes it easy to work with this type of data by allowing you to index the data using both levels. MultiIndex is created using the from_tuples() or from_product() function. Tuples are efficient structures for storing data, and tuples can be used to create a MultiIndex. The tuples function of MultiIndex, from_tuples can be used to create multi-index from multiple columns of data. Similarly, the product function (from_product) can also be used to create Multiindex. The following is a sample of MultiIndex data in a sample spreadsheet. The red border represents Multiindex data.
Create MultiIndex Dataframe using Tuples function
We’ll start by creating labels which will then be followed by creating a list of tuples using list & zip method. Each tuple will represent index in each row in our Dataframe. Then, we’ll use the from_tuples function to create our MultiIndex. The from_tuples method returns an instance of “index”. The index object, data and column names are then used to create the dataframe.
import pandas as pd # # Create iterables # labels = [ ["Aiyana", "Aiyana", "Anisha", "Anisha"], ["Mathematics", "Science", "Mathematics", "Science"] ] # # Create index using MultiIndex from_tuples method # tuples = list(zip(*labels)) index = pd.MultiIndex.from_tuples(tuples, names=["Students", "Subjects"]) # # Create dataframe # df = pd.DataFrame([ [98, 95, 99], [95, 93, 96], [92, 99, 95], [99, 95, 97] ], index=index, columns=["1st term", "2nd term", "Final"]) # # Print dataframe # df.head()
Create MultiIndex Dataframe using Product function
We can create MultiIndex Dataframes using the from_product function in Pandas. This function takes an iterable of iterables, and creates a MultiIndex Dataframe from it. The resulting Dataframe will have a MultiIndex with one level for each iterable in the input. For example, if we have a list of products and a list of countries, we can use from_product to create a MultiIndex Dataframe with products and countries as the levels. This can be very useful when working with data that contains multiple levels of information. From_product is just one of the many ways to create MultiIndex Dataframes; however, it is often the simplest and most convenient method. The code below would create a MultiIndex with two levels, where the first level is [‘A’, ‘B’] and the second level is [‘D’, ‘E’]. You can then use this MultiIndex to index your dataframe. “pd” is an instance of Pandas.
import pandas as pd # # Create iterables # iterables = [["Aiyana", "Ajitesh", "Sumit", "Saanvi"], ["Mathematics", "Science", "English"]] # # Create index using MultiIndex from_product method # index = pd.MultiIndex.from_product(iterables, names=["Students", "Subjects"]) # # Create dataframe # df = pd.DataFrame([ [98, 96], [96, 97], [85, 89], [92, 95], [99, 94], [87, 93], [91, 89], [90, 93], [82, 87], [97, 90], [95, 91], [84, 91] ], index=index, columns=["1st Term", "2nd Term"]) # # Print dataframe # df
Creating a MultiIndex using from_tuples or from_product method in Pandas is a helpful way to store data more efficiently. This method is easy to implement and well suited for handling multiple index values for a single row of data. Thanks for reading!