Have you ever told a story to someone, but they just didn’t seem to understand it? They might have been confused about the plot or why the characters acted in certain ways. If this has happened to you before, then you are not alone. Many people struggle with data storytelling because they do not know how to communicate their data effectively.
In this blog post, you will learn about some of the key concepts in relation to data storytelling and why data scientists / data analyst should acquire this skill. Data storytelling is one of the key skills which data scientists would need to acquire in order to do a great job in representing the data with story. Most of the time, it has been seen that data scientists merely present multiple plots with the sole aim of showing the logic and reasoning. However, it is equally important to represent the data as story as it results in an emotional connect with stakeholders and help them make the decisions. Thus, data scientists must acquire the data storytelling skills to do a great job.
What’s Data Storytelling?
Data storytelling is a way to convey data-driven stories, as it enhances engagement and stimulates curiosity among viewers. There are various different data visualization tools that allow data storytellers to animate seemingly static data into eye-catching infographics, which can enhance understanding and spark more discussion across multiple channels of social media. This is particularly helpful for organizations with more complex datasets that require careful exploration and more time than what’s traditionally allocated in a presentation or meeting.
Data storytelling represents the methods of extracting useful information/knowledge/insights from the data and present it as a compelling story to a specific audience. From the business standpoint, the primary goal behind data storytelling is to extract actionable insights/information from the data in order to identify hidden business opportunities. The following are key aspects of data storytelling:
- Data preparation
- Data Visualization
Let’s understand the above aspects with few examples.
Data Preparation – Great Story requires right data set!
First and foremost, it is most important to gather the right kind of data from various different sources and prepare the data appropriately for further analysis. Here is a great related quote I could gather from the internet in relation to stories and data.
“Maybe Stories are just Data with a Soul!” – Brene Brown
It is very important to understand what is the right kind of data that can create actionable insights. Post that, it is equally important to identify the reliable data sources. Once data is gathered, cleaned, and prepared, the next step is to understand different aspects of data (perform data analysis) vis-a-vis related business domains. A diligent data analysis will be an important step in order to transition to the next step – visualization.
In this post, the example taken is the need to make an informed decision on whether to continue investing money in MS Dhoni for the upcoming IPL Season 2020. What is needed is actionable insight supported by the story which can help in the decision making. Thus, as a first step, it is important to identify what kind of data will help in making this decision and where to take this data. In this post, I took the IPL batting average scores of MS Dhoni for the last 10 years (2010-2019) and the data source used is IPL website.
Data Visualization – Important Part of Data Storytelling
One of the most important aspects of data storytelling is using the right kind of visualization plots. The primary goal is to come up with actionable insights supported by the story with which decision-makers can connect. In order to come up with actionable insights with a great story, it is very important to use the right kind of plot. If the right plots are not used, it will be difficult to extract information out of any given data. The goal is to extract the information/knowledge / actionable insights out of the data. Let’s understand this with an example of working with IPL batting average scores of MS Dhoni for the last 10 years (2010-2019).
Here are the IPL batting average scores of last 10 seasons (2010-2019) of Mahendra Singh Dhoni, one of the greatest Indian team cricket captain of all times.
# # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20])
Could you make anything out of this data? In other words, were you able to extract any information out of the above data?
Alright! Let’s try a little hard and draw line plots / scatter plots and see if we can extract some information out of data that can be represented as a story or used for storytelling. Here is the Python code for drawing scatter / line plots of the above data.
import matplotlib.pyplot as plt import numpy as np # # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20]) fig, ax = plt.subplots(1, 2, figsize=(13, 6)) ax.scatter(X, ms_dhoni) ax.plot(X, ms_dhoni) fig.text(0.5, 0.04, 'Years', ha='center', fontsize=18) fig.text(0.04, 0.5, 'Average Scores in IPL Seasons', va='center', rotation='vertical', fontsize=18)
The above Python code would result in the following scatter and line plots. Can you extract some story out of these plots? Can you extract some information out of these plots?
I don’t think we can extract any useful information out of these plots. Now, let’s add a trend line to a line chart. Here is the Python code to draw line charts and trend lines.
import matplotlib.pyplot as plt # # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20]) fig, ax = plt.subplots(1, 1, figsize=(10, 8)) z = np.polyfit(X, ms_dhoni, 1) p = np.poly1d(z) plt.plot(X,p(X),"r--") plt.plot(X, ms_dhoni) plt.title('MS Dhoni IPL Batting Average Scores', fontsize=16) plt.xlabel('Years', fontsize=16) plt.ylabel('Average Scores in IPL Seasons', fontsize=16)
Executing the above code would print the following plot. Can we extract some story or information out of this plot?
Yes, we can extract information from the above plot. The plot is shown to have an upward trend representing the fact that Dhoni looks to be playing well.
Thus, it is important that you choose the right kind of visualization plot to represent the story related to data. In other words, choose the visualization plot using which maximum information can be represented in the data.
Storytelling – Communicating Story
Now that we have the appropriate visualization plot ready, it is very important part to communicate the story in a manner that actionable insights could be derived. The ultimate goal is to help decision-makers take the decision.
So, what story can be communicated using the visualization plot shown in the previous section?
The story is this – Dhoni looks to be playing well! His batting average is seen to have an upward trend which means that he can be trusted to play well in the upcoming season. Thus, an informed decision can be taken to invest on him.
- Data storytelling: The essential data science skills everyone needs
- How Sheryl Sandberg’s Last Minute Addition To Her TED Talk Sparked A Movement
- Brené Brown’s Presentation Caught Oprah’s Attention. The Same Skills Can Work For You
Here is the summary of what you learned in this post regarding data storytelling:
- Data storytelling is one of the most important skills data scientists must acquire to do a great job in the process of building machine learning models.
- Key aspects of data storytelling is data preparation, data visualization and data storytelling with the help of data visualization.
- One of the primary goal of data storytelling is to extract useful information / actionable insights from the data and present the information as compelling story.
- Accounts Payable Machine Learning Use Cases - October 25, 2021
- Stock Price Prediction using Machine Learning Techniques - October 24, 2021
- Type I & Type II Errors in Hypothesis Testing: Examples - October 23, 2021