Following are the key points related with different aspects of data scientist, that are described later in this article:
- Key skills of a data scientist
- Key roles & responsibilities of a data scientist
- What would it take me to become a data scientist?
- What would I create as a Data Scientist?
Key Skills of a Data Scientist
- Mathematics & Statistics Knowledge: A data scientist would do a great job if he/she has a strong mathematics and statistical background. This skill would be useful to perform data analytics.
- Strong Computing Skills: A data scientist need to be good at data munging, meaning scraping, parsing, and processing data; This is the useful in preparing the data for analysis. Data scientists are good at taking the data and produce a consumable data-driven apps or data products.
- Data Visualization Skills: Once prepared the data for analysis, the data scientist need to be good at communicating the result to the business teams. For this purpose, he need to use one or more visualization techniques to explain the results out of data analysis.
- Data Preservation: A data scientist need to be aware of how to store and manage the data after the analytics and visualization is done.
Key Roles & Responsibilities of a Data Scientist
The data scientists play some of the following roles (each matching to an existing IT professional role) and this is why it makes it tricky for a person to become a data scientist or fire a data scientist as he may be required to be skilled in more than one area such as some of the following. Thus, if you are already one of the following, it should be easier to get started on the journey of data science.
- Business intelligence (BI) professionals: A BI professional works with data warehouse and visualization dashboard. BI professionals need to get good at consuming data and come out with data products. The key is that data schanges very rapidly and they need to accommodate for that aspect.
- Database administration (DBA): DBAs would be required to work with different form of data (primarily unstructured data) and not only the one which could get stored in databases.
- Statisticians: Statisticians would be required to get good at working with large data-sets.
- Visualization experts
- Machine learning: This one is very close to data scientist. The person good at maching learning would be required to get good at data munging (data preparation).
If you look at above, it may seem like hiring a team to solve the data science problem and it may not be feasible for one person to acquire all the skills.
Following are some of the key activities (responsibilities) that a data scientist perform:
- Prepare data for statistical analysis: This involves some of the tasks such as data gathering, data cleaning, data restructuring, transforming data, combining the data, merging the data, verifying data, extracting etc. In this case, his programming skills come handy
- Run the statistical analysis. Here, his mathematical and statistical knowledge comes handy.
- Interpret and communicate the results. In this case, his visualization skills come handy.
In addition to above, in order to do great job with above, a data scientist would be require to understand the business domain knowledge (represented by Substantive Expertise in diagram below) associated with the data. Following diagram represents key aspects of a data scientist.
What would it take for me to become a Data Scientist?
If you are one of the following, read further to understand what may get needed to become a data scientist:
- Software Engineer/Web Developer: If you are a web developer or a programmer, it may be a fresh start for you. You would be required to learn the fundamentals of some of the following to start the journey of data science:
- Mathematics and statistics
- Machine learning
- Data visualization
- Statistician: If you are a statistician, you may need to learn different aspects of data munging (scraping, parsing, processing) skills.
- Business analysts: Business analysts may need to learn one or more data algorithms to be able to do a good job.
- DBAs: DBAs need to learn on how to work with unstructured data.
What would I create as a Data Scientist?
Data scientist is primarily about creating data products that could be used by others to use the data for their own analysis or visualizations. Data products help communicate the results to others. Following are some of the examples:
- Data-driven apps such as spell-checker
- Interactive visualizations
- Online databases