- Why can’t I use Java/C etc for data analysis?
- Key Aspects of Data Analysis vis-a-vis R Language
- Why R fundamentally?
- Advantages & Disadvantages of R
Why can’t I use Java/C etc for data analysis?
I have worked a lot with Java/C/PHP/C++ etc in my career. From whatever I have known about R by now, I could confidently say that I am thankful that we have R language. This is because last thing I would want to do with data analysis is to first write programs and then do data analysis using these programs. What a mess it will be! What a motivation killer it will be! This is because I need to achieve minimum of following very quickly when I am analyzing data:
- Cleanup the data
- Transform the data in different formats as part of analysis and access the same
- Create visualization charts for different transformations
- Model the data.
To do all of the above, as I imagine, I see myself running away from writing programs and changing it for small modifications in order to do efficient and effective data analysis. Ideally, if I could get a console where I could use one or more commands to achieve above objectives, it would be so easy and fast. This is where R comes very handy. With few commands, you could access the data, do data clean-up, visualize and model the data very easily.
Key Aspects of Data Analysis vis-a-vis R Language
Following are some of the key aspects/phases of data analysis which need to performed in order to find out actionable knowledge or insight from data.
- Data access
- Data cleaning/wrangling
- Data transformation
- Data visualizations
- Data modeling
Most of the above aspects include the application of mathematics and statistical knowledge along with machine learning algorithms. And, what is needed is a programming language or a platform which achieves the above objectives along with fulfilling the need of becoming a platform suited for statistics. R fits in very well.
Why R Fundamentally?
I was seeing a Stanford seminar on Data analysis with R where following was represented as key aspect of solving a data analysis problem and how R fits in:
- Thinking about the problem
- Coding the solution
- Computer running the solution
When doing the data analysis, lot of time is spent on thinking about the problem. So, with data analysis, what is needed is a programming language or a platform which helps in thinking about or expressing the problem and quickly come up with solution. This is where R fits in very well. Once the solution is identified, one can than use programming languages such as C or Java or Python to code and achieve some of the requirements related with performance & scalability.
Advantages & Disadvantages of R
Following are some of the disadvantages of R language:
- It has a steeper learning curve. After spending some time with R, I could say that if you are not one of those with virtue of patience, be sure to see you run away or get frustated after doing it for a short while.
- Some say that it is slower than programming language such as C or for that matter, Java etc. But who cares as what is needed is a language that aid in data analysis and that is it, and R does it very well.
- Other aspect is that entire data has to fit in memory for you to work with data. If you are working on your laptop, you could only work with limited data. However, as we talk about Big Data, it is expected that you have system with higher configuration for RAM etc.
Following are some of the advantages of R:
- It is open-source and free.
- It has large community support
- It runs on multiple platforms such as Unix, Win
- If you need some functionality which is not present as a package, you could build it yourself.
- Supports visualizations