Category Archives: Big Data

Learn R – When to use Histogram, Scatterplot & Boxplot – Code Example

This article represents some facts on when to use what kind of plots with code example and plots, when working with R programming language. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key plots described later in this article: Histogram Scatterplot Boxplot   Following is the description for above mentioned plots along with code examples based on base R package. Note that each of the these plots could be done using different commands when using ggplot2 package. Histogram:Histograms is one of the best form of visualizations when working with single continuous variable. It plots the relative …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – 6 Steps to Perform Data Analysis using R

data analysis

This article represents steps that one could take to perform data analysis on available datasets using data science (machine learning algorithms) with the help of R programming language. The objective of this article is to introduce an approach for data science beginners to get started with data analysis. However, as you get experience you could adopt your own techniques that works for you. These are just my thoughts and there could be better way of approaching data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key steps which could be taken as a blueprint …

Continue reading

Posted in Big Data. Tagged with .

Big Data – Team to Hire for Big Data Practice

big data team

This article represents thoughts on Big data team composition and different considerations to make in order to hire and build an effective Big Data team. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. A Big data team would need to cover following two key areas for becoming an effective team ready to deliver on key Big Data initiatives. Data engineering Data science   Data Engineering Team You would want to build a team who plays key role in some of the following areas: Data processing (Hadoop Map/Reduce) Data storage (HDFS/HBase) Data coordination (Zookeeper) Data monitoring/management For above skills, …

Continue reading

Posted in Big Data. Tagged with .

Learn R – Different Data Types with Code Examples

R Data Types

This article represents quick concepts on key data types in R programming language, along with code examples and some good go-to links for further read. For those new to R, I would like to quickly re-iterate that R programming language helps in performing data analysis and, is an integral part of data science as a practice. In other words, it is one of the go-to language/platform for data scientist to work with the data. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are different data types in R that would be discussed in this article: Vector List Factor …

Continue reading

Posted in Big Data. Tagged with .

Big Data – Functional & Technology Architecture for Beginners

This article represents a view associating functional and technology elements of Big Data reference architecture. The objective of this article is to present a view relating key functional areas in Big Data with relevant technologies. The diagram and related description could be of use to Big Data beginners (developers, architects, business analysts etc) wanting to get a high-level view on functional and technology aspect of Big Data. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following diagram represents the functional and technology landscape view of Big Data. The objective of the diagram below is following: Associate functional areas …

Continue reading

Posted in Big Data. Tagged with .

Learn R – What are Vectors – Code Examples

vector

This article represents high level concepts in relation with Vector data type in R programming language along with code samples. For those new to R language, it should be noted that R provides a console-based platform to perform analysis on data. R can be seen as a programming language for data scientist. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What are Vectors? Vectors – Code Examples   What are Vectors? Vector, in R, can be defined as a collection of things of same data type. Simply speaking, it …

Continue reading

Posted in Big Data. Tagged with , .

Big Data – Top 6 Frameworks Required to Get Started

This article represents top 6 software frameworks (or tools) to get started with Big Data POC projects. This article may be of interest to those who are beginning with Big Data and want to understand about tools/frameworks required to get started with their Big Data POC projects. The article presents only the  bare minimum set of frameworks that are required to get started. I am sure there could be more to this list. However, my objective is to cover only the minimum set. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are key functional areas in Big Data …

Continue reading

Posted in Big Data. Tagged with .

Data Science – Commonly Used Plot Parameters in R Programming

This article represents some of the commonly used plot parameters across different plot commands, while you are working with different kind of plots in R. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: What are some of the common plots (commands) in R? Commonly Used Plot Parameters   What are some of the common plots (commands) in R? Following represents some of the plots (commands) used in R language for different purposes. I shall be writing different blog on different use-cases where one should use one or more …

Continue reading

Posted in Big Data. Tagged with , .

Data Science – Why Learn R?

This article represents thoughts on why it is OK to learn yet another programming language named as R for doing data analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the some of the key points described later in this article: Why can’t I use Java/C etc for data analysis? Key Aspects of Data Analysis vis-a-vis R Language Why R fundamentally? Advantages & Disadvantages of R   Why can’t I use Java/C etc for data analysis? I have worked a lot with Java/C/PHP/C++ etc in my career. From whatever I have known about R by now, …

Continue reading

Posted in Big Data. Tagged with , .

How Can I Become A Data Scientist?

data-scientist

This article represents thoughts, primarily, on how to become a data scientist. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points related with different aspects of data scientist, that are described later in this article: Key skills of a data scientist Key roles & responsibilities of a data scientist What would it take me to become a data scientist? What would I create as a Data Scientist? Key Skills of a Data Scientist Mathematics & Statistics Knowledge: A data scientist would do a great job if he/she has a strong mathematics and statistical background. …

Continue reading

Posted in Big Data. Tagged with .

Big Data – How to Get Started with Data Science

This article represents my opinion on what would it take to get started with Data Science. As I started exploring Big Data, one thing that became clear is that I may not be successful with Big Data unless I have learnt and applied Data Science to make sense out of Big Data (the data with 3Vs: Volume, Velocity, Variety). This is where I started to find out on How to Get Started with Data Science. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are the key points described later in this article: Data Science is NOT Easy …

Continue reading

Posted in Big Data. Tagged with .

Big Data – Top 8 Use Cases for Beginners

This article represents top 8 Big Data use cases that beginners could get started with, and create one or more proof-of-concept (POC) projects around these use cases. I compiled the list after digging enough at various places on web, videos, webinar etc. Different use cases mentioned below are only briefly discussed and each of them shall be explained later in separate articles. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. Following are use cases are described later in this article: Social sentiment analysis Customer interaction analysis Pattern matching Publicly available data analysis Web pages data analysis Clinical data …

Continue reading

Posted in Big Data.

How to Start a Big Data Practice

This article represents key aspects of starting up Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.   Big Data Center of Excellence (COE) It may be a good idea to plan around setting up a Big Data Center of Excellence (COE)whose main objective would be take a holistic approach towards following two key aspects of Big Data from different perspectives such as setting up team, evaluating tools & frameworks, …

Continue reading

Posted in Big Data. Tagged with .

ShriGB – A Semantic Financial Search Engine

ShriGB, as the name goes, is about extracting valuable insights (“Shri” – respect) from large/big data (“GB”) . The project is aimed to leverage semantic web & big data technologies to extract meaningful insights from unstructured financial data lying across the web.  The data is mostly present in raw form and is useful to some sections of society although, can be used by different section of people for different reasons. Lets take a look at following example: Dabur to set up manufacturing units in Uttaranchal The above data can mean some of the following: More jobs are going to be created in Uttaranchal region This may lead to boost in …

Continue reading

Posted in Big Data, Semantic Web. Tagged with .

Big Data & Predictive Modelling

Talk about big data and things that appear first in an engineer’s mind is Hadoop & related technology. The key thing that is getting missed time and again by many developers’ working on Big Data is a sense of reading/understanding/learning the data and designing algorithms to achieve different objectives such as derivations, predictions etc.   One of the key aspect of data science which is also key to Big Data is Predictive Modelling. I wanted to do some quick research and develop an understanding around this topic. However, while researching, it was found that the topic does include some complex underlying mathematical models which will surely be very hard to …

Continue reading

Posted in Big Data. Tagged with .

Key to Big Data: Data Science & Data Framework

Good familiarity with data science is key to getting on board with Big Data implementations. Almost all software services provider has added another link for Big Data for their services offerings. Most of them have an understanding that a Hadoop team comprising of technical team familiar with Hadoop technology stack shall be able to successfully implement Big Data project. However, this is far from the reality. One of the keys to successful Big Data implementation projects is “Data Science“. And, another aspect is “Data Framework“. The two when done jointly would get a team do successful Big Data implementation. What is Data Science? Data Science, simply speaking, is understanding meta-data …

Continue reading

Posted in Big Data. Tagged with , , .