Categories: Big Data

Key to Big Data: Data Science & Data Framework

Good familiarity with data science is key to getting on board with Big Data implementations. Almost all software services provider has added another link for Big Data for their services offerings. Most of them have an understanding that a Hadoop team comprising of technical team familiar with Hadoop technology stack shall be able to successfully implement Big Data project. However, this is far from the reality.

One of the keys to successful Big Data implementation projects is “Data Science“. And, another aspect is “Data Framework“. The two when done jointly would get a team do successful Big Data implementation.

What is Data Science?

Data Science, simply speaking, is understanding meta-data & relationships about the data. The key to data science is “Ontology“. Ontology is nothing but defining concepts and their relationships out of a set of data. Simply speaking, when you read a paragraph, you try and understand the key concepts in term of few words or terminologies and, try to relate them in your mind for better understanding and future reference. This is dealt under Ontology.

One another concept is Resource Description Framework (RDF). RDF helps define the concepts & relationship (Ontology) in form of Triples and helps develop Taxonomy (hierarchical relationship) between the data.

The challenge with data scientist is to relate the ontology with business objectives and suggest software engineer to plan for development (map-reduce algorithm) appropriately. The data scientist, thus, would have to work with both, business analyst and the software engineers.

Take a look at following statement:

Scotts garments to set up new units in Karnataka, Maha

Following are some triples that can be derived from above data:

Subject: Scott garments, Predicate: to set up, Object: new units

Subject: Scott garments, Predicate: to invest in, Object: Karnataka, Maha

Subject: New units, Predicate: to be set up in, Object: Karnataka, Maha

Above data can be useful to so many categories of people such as students, recruitment consultants, investors etc. However, to be able to decipher above out of a statement requires understanding on concepts such as Ontology, RDF, Data Taxonomy etc.

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Coefficient of Variation in Regression Modelling: Example

When building a regression model or performing regression analysis to predict a target variable, understanding…

3 weeks ago

Chunking Strategies for RAG with Examples

If you've built a "Naive" RAG pipeline, you've probably hit a wall. You've indexed your…

1 month ago

RAG Pipeline: 6 Steps for Creating Naive RAG App

If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…

1 month ago

Python: List Comprehension Explained with Examples

If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…

1 month ago

Large Language Models (LLMs): Four Critical Modeling Stages

Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…

4 months ago

Agentic Workflow Design Patterns Explained with Examples

As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…

4 months ago