In this post, you will learn about some of the key concepts in relation to data storytelling and why data scientists / data analyst should acquire this skill. Data storytelling is one of the key skills which data scientists would need to acquire in order to do a great job in representing the data with story. Most of the time, it has been seen that data scientists merely present multiple plots with the sole aim of showing the logic and reasoning. However, it is equally important to represent the data as story as it results in an emotional connect with stakeholders and help them make the decisions. Thus, data scientists must acquire the data storytelling skills to do a great job.
What’s Data Storytelling?
Data storytelling represents the methods of extracting useful information / knowledge / insights from the data and present it as a compelling story to specific audience. From the business standpoint, the primary goal behind data storytelling is to extract actionable insights / information from the data in order to identify hidden business opportunities. The following are key aspects of data storytelling:
- Data preparation
- Data Visualization
Let’s understand the above aspects with few examples.
Data Preparation – Great Story requires right data set!
First and foremost, it is most important to gather the right kind of data from various different sources and prepare the data appropriately for further analysis. Here is a great related quote I could gather from the internet in relation to stories and data.
“Maybe Stories are just Data with a Soul!” – Brene Brown
It is very important to understand what is right kind of data which can create actionable insights. Post that, it is equally important to identify reliable data source. Once data is gathered, cleaned and prepared, the next step is to understand different aspects of data (perform data analysis) vis-a-vis related business domain. A diligent data analysis will be important step in order to transition to next step – visualization.
In this post, the example taken is need to make informed decision on whether to continue investing money on MS Dhoni for upcoming IPL Season 2020. What is needed is actionable insight supported by the story which can help in the decision making. Thus, as a first step, it is important to identify what kind of data will help in taking this decision and where to take this data. In this post, I took IPL batting average scores of MS Dhoni for last 10 years (2010-2019) and the data source used is IPL website.
Data Visualization – Important Part of Data Storytelling
One of the most important aspect of data story telling is using right kind of visualization plots. The primary goal is to come up with actionable insights supported by story with which decision makers can connect. In order to come up with actionable insights with great story, it very important to use right kind of plots. If right plots are not used, it will be difficult to extract information out of any given data. The goal is to extract the information / knowledge / actionable insights out of the data. Let’s understand this with an example of working with IPL batting average scores of MS Dhoni for last 10 years (2010-2019).
Here is the IPL batting average scores of last 10 seasons (2010-2019) of Mahendra Singh Dhoni, one of the greatest Indian team cricket captain of all times.
# # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20])
Could you make anything out of this data? In other words, were you able to extract any information out of the above data?
Alright! Let’s try a little hard and draw line plots / scatter plots and see if we can extract some information out of data which can be represented as a story or used for storytelling. Here is the Python code for drawing scatter / line plots of above data.
import matplotlib.pyplot as plt import numpy as np # # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20]) fig, ax = plt.subplots(1, 2, figsize=(13, 6)) ax.scatter(X, ms_dhoni) ax.plot(X, ms_dhoni) fig.text(0.5, 0.04, 'Years', ha='center', fontsize=18) fig.text(0.04, 0.5, 'Average Scores in IPL Seasons', va='center', rotation='vertical', fontsize=18)
The above Python code would result in the following scatter and line plots. Can you extract some story out of these plots? Can you extract some information out of these plots?
I don’t thing we can extract any useful information out of these plots. Now, let’s add a trend line to line chart. Here is the Python code to draw line chart and trend line.
import matplotlib.pyplot as plt # # MS Dhoni IPL Batting Average Scores Across Seasons (2010-2019) # X = np.array([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]) ms_dhoni = np.array([31.88, 43.55, 29.83, 41.90, 74.20, 31.00, 40.57, 26.36, 75.83, 83.20]) fig, ax = plt.subplots(1, 1, figsize=(10, 8)) z = np.polyfit(X, ms_dhoni, 1) p = np.poly1d(z) plt.plot(X,p(X),"r--") plt.plot(X, ms_dhoni) plt.title('MS Dhoni IPL Batting Average Scores', fontsize=16) plt.xlabel('Years', fontsize=16) plt.ylabel('Average Scores in IPL Seasons', fontsize=16)
Executing above code would print the following plot. Can we extract some story or information out of this plot?
Yes, we can extract information out of the above plot. The plot is shown to have an upward trend representing the fact that Dhoni looks to be playing well.
Thus, it is important that you choose right kind of visualization plot to represent the story related to data. In other words, choose the visualization plot using which maximum information can be represented in the data.
Storytelling – Communicating Story
Now that we have the appropriate visualization plot ready, it is very important part to communicate the story in the manner that actionable insights could be derived. The ultimate goal is to help decision makes take the decision.
So, what story can be communicated using the visualization plot shown in the previous section.
The story is this – Dhoni looks to be playing well! His batting average is seen to have an upward trend which means that he can be trusted to play well in the upcoming season. Thus, an informed decision can be taken to invest on him.
- Data storytelling: The essential data science skills everyone needs
- How Sheryl Sandberg’s Last Minute Addition To Her TED Talk Sparked A Movement
- Brené Brown’s Presentation Caught Oprah’s Attention. The Same Skills Can Work For You
Here is the summary of what you learned in this post regarding the data storytelling:
- Data storytelling is one of the most important skills data scientists must acquire to do a great job in the process of building machine learning models.
- Key aspects of data storytelling is data preparation, data visualization and data storytelling with the help of data visualization.
- One of the primary goal of data storytelling is to extract useful information / actionable insights from the data and present the information as compelling story.