Based out of my data science learning experience, one thing that I have become sure about is the fact that data science is mostly suited for courageous and hungry souls and, those not having some or most of the following characteristics would get off track sooner than later in their quest to become a data scientist. I shall describe each one of the following key should-have skills (although some of them are must-have) of a data scientist, later in this article:
- Mathematical Bent (Must-have)
- Computing/Programming Skills (Must-have)
- Academic or Research Bent
- Teach-ability (Teaching Skills)
Before I go ahead and share my opinion on each one of the above, just wanted to mention that there are following four key components of data science:
- Mathematics (Algebra, Probability & Statistics, Derivatives etc)
- Machine Learning
- Business domain knowledge
- General computer programming skills
As one starts learning machine learning algorithms, one would require to understand following two key aspects both of which are related with mathematical formula:
- Machine learning model represented using mathematical formula
- Different optimization techniques that enables the algorithm to provide most appropriate estimates of the response or dependent variable.
- Statistical evaluation of estimated value
In all of the above cases, one would be required to deal with lot of mathematical concepts such as linear algebra, vector algebra, matrix algebra, polynomials, probability and statistics, derivatives etc.
Simply speaking, if one does not have good concepts around above, he or she would soon hit the wall and loose interest to learn any further. Thus, I would rather take this opportunity to suggest that one should better get himself or herself very much familiar with some of the above mentioned concepts before getting started with machine learning algorithms.
In case you are very strong with mathematics and want to become expert with data science, another key skill required is the computing or programming skills. As one completes learning the machine learning algorithm, one would require to make a choice of one or more statistical programming languages/tools such as R, MATLAB, Python, Octave etc to get started with testing these algorithms on different data sets and consolidate his/her learning. Note that these statistical language/tools comes up with software packages which helps to implement different machine learning algorithm and test the models. In addition, one would also require to get his hands dirty with software framework/libraries such as Apache Mahout, MLLib etc. I would have to ruthlessly confess that it does not end here. One may also need to learn fundamentals of Hadoop ecosystem tools such as Hive/Pig which assists in data exploration & analysis.
If you are out of college, learn-ability may not hurt you much as you may be fresh on mathematical/statistical concepts and also, machine learning algorithms. However, if you have been in industry for sometime and working on general programming projects, you may require to have higher learn-ability to learn as there will be so much to learn. Some of the key challenges related with learn-ability is the search of perfect trainer/teacher and learning resources such as books/videos/slides/docs etc. Given that web/internet is flooded with all of these, it becomes much more challenging to plan your learning appropriately so as to learn enough that could get you started. Following would be some of the areas to learn in order to get on board with data scientist career:
- Mathematical concepts such as linear algebra, vector algebra, matrix algebra, polynomials, probability and statistics, derivatives etc.
- Machine learning algorithms
- At least, one statistical language such as R, Matlab, Python etc.
- Hadoop data exploratory & data analysis tools
- Familiarity with at least one programming language such as Java. This is, however, optional
In addition to having high learn-ability, one may also require to be creative enough to extract one or more use cases from business data that he working on. At times, you would have to work with business users and ask them to explain business. From the explanation, you may have to find out one or more predictive use-cases which could impact business in positive manner. Alternatively, based out of knowledge of the business, you may come up with creative ideas and suggest business for implementation.
Patience is definitely one of the key traits of a data scientist. If you are of impatient sort, you may want to do something about it. The primary reasons are some of the following:
- One may need lot of patience to learn so many different concepts which would take lot of time. That would surely require good amount of patience.
- One may need to test his model with different features and different data sets. This would lead to creation of large number of models. In order to select the most appropriate model, one may surely need enough patience to test numerous models.
- One may require to constantly read and learn from different research papers.
Academic or Research Bent
At a regular level, one may need to constantly read through different reserach papers in the area related to machine learning. For example, deep learning is most talked about these days. One may require to keep himself up-to-date with deep learning concepts. Then, there are research going on in different universities across the world which one would regularly want to check and keep himself up-to-date.
Given that there is a shortage of people with data science skills, it seems to become moral responsibility (once we become data scientist) to educate/train all those who want to pursue the career in data science. This would not only help us contribute and give back to society, in general, but also enable us to learn different aspects of data science in much better manner. It goes with the saying that “teaching is the best form of learning.”
He has also authored the book, Building Web Apps with Spring 5 and Angular.
Latest posts by Ajitesh Kumar (see all)
- When not to use F-Statistics for Multi-linear Regression - July 16, 2019
- R-Squared Explained for Indian Grandma - July 14, 2019
- Machine Learning – Cloud-native Model Deployments - June 29, 2019