Good familiarity with data science is key to getting on board with Big Data implementations. Almost all software services provider has added another link for Big Data for their services offerings. Most of them have an understanding that a Hadoop team comprising of technical team familiar with Hadoop technology stack shall be able to successfully implement Big Data project. However, this is far from the reality.
One of the keys to successful Big Data implementation projects is “Data Science“. And, another aspect is “Data Framework“. The two when done jointly would get a team do successful Big Data implementation.
What is Data Science?
Data Science, simply speaking, is understanding meta-data & relationships about the data. The key to data science is “Ontology“. Ontology is nothing but defining concepts and their relationships out of a set of data. Simply speaking, when you read a paragraph, you try and understand the key concepts in term of few words or terminologies and, try to relate them in your mind for better understanding and future reference. This is dealt under Ontology.
One another concept is Resource Description Framework (RDF). RDF helps define the concepts & relationship (Ontology) in form of Triples and helps develop Taxonomy (hierarchical relationship) between the data.
The challenge with data scientist is to relate the ontology with business objectives and suggest software engineer to plan for development (map-reduce algorithm) appropriately. The data scientist, thus, would have to work with both, business analyst and the software engineers.
Take a look at following statement:
Following are some triples that can be derived from above data:
Subject: Scott garments, Predicate: to set up, Object: new units
Subject: Scott garments, Predicate: to invest in, Object: Karnataka, Maha
Subject: New units, Predicate: to be set up in, Object: Karnataka, Maha
Above data can be useful to so many categories of people such as students, recruitment consultants, investors etc. However, to be able to decipher above out of a statement requires understanding on concepts such as Ontology, RDF, Data Taxonomy etc.