Key to Big Data: Data Science & Data Framework

Good familiarity with data science is key to getting on board with Big Data implementations. Almost all software services provider has added another link for Big Data for their services offerings. Most of them have an understanding that a Hadoop team comprising of technical team familiar with Hadoop technology stack shall be able to successfully implement Big Data project. However, this is far from the reality.

One of the keys to successful Big Data implementation projects is “Data Science“. And, another aspect is “Data Framework“. The two when done jointly would get a team do successful Big Data implementation.

What is Data Science?

Data Science, simply speaking, is understanding meta-data & relationships about the data. The key to data science is “Ontology“. Ontology is nothing but defining concepts and their relationships out of a set of data. Simply speaking, when you read a paragraph, you try and understand the key concepts in term of few words or terminologies and, try to relate them in your mind for better understanding and future reference. This is dealt under Ontology.

One another concept is Resource Description Framework (RDF). RDF helps define the concepts & relationship (Ontology) in form of Triples and helps develop Taxonomy (hierarchical relationship) between the data.

The challenge with data scientist is to relate the ontology with business objectives and suggest software engineer to plan for development (map-reduce algorithm) appropriately. The data scientist, thus, would have to work with both, business analyst and the software engineers.

Take a look at following statement:

Scotts garments to set up new units in Karnataka, Maha

Following are some triples that can be derived from above data:

Subject: Scott garments, Predicate: to set up, Object: new units

Subject: Scott garments, Predicate: to invest in, Object: Karnataka, Maha

Subject: New units, Predicate: to be set up in, Object: Karnataka, Maha

Above data can be useful to so many categories of people such as students, recruitment consultants, investors etc. However, to be able to decipher above out of a statement requires understanding on concepts such as Ontology, RDF, Data Taxonomy etc.

 

Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.
Posted in Big Data. Tagged with , , .