Key to Big Data: Data Science & Data Framework

0

Good familiarity with data science is key to getting on board with Big Data implementations. Almost all software services provider has added another link for Big Data for their services offerings. Most of them have an understanding that a Hadoop team comprising of technical team familiar with Hadoop technology stack shall be able to successfully implement Big Data project. However, this is far from the reality.

One of the keys to successful Big Data implementation projects is “Data Science“. And, another aspect is “Data Framework“. The two when done jointly would get a team do successful Big Data implementation.

What is Data Science?

Data Science, simply speaking, is understanding meta-data & relationships about the data. The key to data science is “Ontology“. Ontology is nothing but defining concepts and their relationships out of a set of data. Simply speaking, when you read a paragraph, you try and understand the key concepts in term of few words or terminologies and, try to relate them in your mind for better understanding and future reference. This is dealt under Ontology.

One another concept is Resource Description Framework (RDF). RDF helps define the concepts & relationship (Ontology) in form of Triples and helps develop Taxonomy (hierarchical relationship) between the data.

The challenge with data scientist is to relate the ontology with business objectives and suggest software engineer to plan for development (map-reduce algorithm) appropriately. The data scientist, thus, would have to work with both, business analyst and the software engineers.

Take a look at following statement:

Scotts garments to set up new units in Karnataka, Maha

Following are some triples that can be derived from above data:

Subject: Scott garments, Predicate: to set up, Object: new units

Subject: Scott garments, Predicate: to invest in, Object: Karnataka, Maha

Subject: New units, Predicate: to be set up in, Object: Karnataka, Maha

Above data can be useful to so many categories of people such as students, recruitment consultants, investors etc. However, to be able to decipher above out of a statement requires understanding on concepts such as Ontology, RDF, Data Taxonomy etc.

 

Ajitesh Kumar

Ajitesh Kumar

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.

He has also authored the book, Building Web Apps with Spring 5 and Angular.
Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.