Data Science

Pandas: How to Create a Dataframe – Examples

One of the most popular modules for working with data in Python is the Pandas library. Pandas provides data structures and operations for working with structured data. A key concept in Pandas is the Dataframe. Learning how to create and use dataframes is an important skill for anyone including data analysts and data scientists working with data in Python. In this post, you will learn about how to create a Pandas dataframe with some sample data.

What is Pandas Dataframe?

A Pandas dataframe is a two-dimensional data structure, like a table in a spreadsheet, with columns of data and rows of data. Dataframe is analogous to a table in SQL and to a matrix in MATLAB. It is an object that stores tabular data in labeled columns and rows. It is created by calling the pandas.DataFrame() function and can be filled with data either by entering it manually or by importing it from a text or csv file. The data can be in any format, including text, numbers, or categorical values (strings). The data in a dataframe can be manipulated and analyzed using the powerful functions and methods provided by the Pandas library.

Some of the common functions which can be invoked on Pandas Dataframe are:

  • print() to print the dataframe to the console
  • head() and tail() to view the first or last n rows of the dataframe, respectively
  • iloc() to select rows and columns of a dataframe by index position.
  • loc() to return a specific row and column from a dataframe.

Create Dataframe using list of lists

Here is the python code for creating a Pandas dataframe using list of lists. As a first step, one needs to create a list of lists, which will serve as our dataframe. The same is depicted in the code below. One can then assign column names.

import pandas as pd
#
# Create dataframe using a list of lists
#
df = pd.DataFrame([['Ajitesh', 84, 183, 'no'], 
                   ['Shailesh', 79, 186, 'yes'], 
                   ['Seema', 67, 158, 'yes'], 
                   ['Nidhi', 52, 155, 'no']])
#
# Assign column names
#
df.columns = ['name', 'weight', 'height', 'smoke_or_not']
#
# Print dataframe
#
df

Create Dataframe using Dictionary of Lists

Here is the code for creating a dataframe using dictionary of lists.

import numpy as np
#
# Creating dictionary
#
dict_lists = {"name":["Ajitesh", "Shailesh", "Seema", "Nidhi"], 
               "weight":[84, 79, 67, 52], 
               "height":[183, 186, 158, 155], 
               "smoke_or_not":["no", "yes", "no", "no"]}
#
# Create dataframe
#
df = pd.DataFrame(dict_lists)
#
# Print dataframe
#
df

The following will be printed:

Fig 1. Pandas Dataframe using Array

Create Pandas using Series

One can create Dataframe using dictionary of Series objects as well. Here is the code:

index=["Mathematics", "Science"]
dict_series = {"Aiyana":pd.Series([95, 99], index), 
               "Saanvi":pd.Series([96, 94], index), 
               "Snehal":pd.Series([99, 92], index), 
               "Anisha": pd.Series([98, 93], index)}
df = pd.DataFrame(dict_series)
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Pricing Analytics in Banking: Strategies, Examples

Last updated: 15th May, 2024 Have you ever wondered how your bank decides what to…

16 hours ago

How to Learn Effectively: A Holistic Approach

In this fast-changing world, the ability to learn effectively is more valuable than ever. Whether…

3 days ago

How to Choose Right Statistical Tests: Examples

Last updated: 13th May, 2024 Whether you are a researcher, data analyst, or data scientist,…

3 days ago

Data Lakehouses Fundamentals & Examples

Last updated: 12th May, 2024 Data lakehouses are a relatively new concept in the data…

4 days ago

Machine Learning Lifecycle: Data to Deployment Example

Last updated: 12th May 2024 In this blog, we get an overview of the machine…

4 days ago

Autoencoder vs Variational Autoencoder (VAE): Differences, Example

Last updated: 12th May, 2024 In the world of generative AI models, autoencoders (AE) and…

4 days ago