Categories: Big Data

Big Data – Functional & Technology Architecture for Beginners

This article represents a view associating functional and technology elements of Big Data reference architecture. The objective of this article is to present a view relating key functional areas in Big Data with relevant technologies. The diagram and related description could be of use to Big Data beginners (developers, architects, business analysts etc) wanting to get a high-level view on functional and technology aspect of Big Data. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
Following diagram represents the functional and technology landscape view of Big Data. The objective of the diagram below is following:
  • Associate functional areas with technologies
  • Reflect on demand vs supply by showing the boxes/texts in green which is readily available to that in red which is difficult to find
  • Reflect on functional/technology areas with respect to easy/intermediate/difficult (green/amber/red)

Big Data Functional Technology Architecture

Following are two core areas of Big Data represented in the diagram above. We shall look into technologies as well as people aspect of each of the core areas in detail, later in this article.

  • Data Engineering
  • Data Science

 

Data Engineering

Data engineering includes following as key functional areas along with key technologies mentioned side-by-side:

  • Collect Data: This is about collecting or gathering data from different data sources. For example, data could either be collected from one or more RDBMS databases or data could be streaming data such as log files (data from internal or external data sources). Different technologies such as following could be used to gather or collect data:
    • Sqoop
    • Flume
    • Scribe
    • Storm
  • Store Data: Once data is collected, it needs to be stored for further processing. Different technologies (frameworks) such as following can be used for handling data storage:
    • HDFS (Hadoop Distributed File System)
    • HBase (NoSQL datastore)
    • MongoDB (NoSQL datastore)
    • Cassandra (NoSQL datastore)
    • CouchDB (NoSQL datastore)
  • Transform, Simplify and Analyze Data: The data, once gathered and stored, needs to be processed further for transforming the data into different forms for performing analytics activity on the data. Hadoop Map/Reduce jobs are run on the stored data which then gets stored on datastores such as HBase etc. From there on, the data analysis phase starts in which tools such as following comes into picture:
    • Hive
    • PIG

All of the above tasks may require data engineer with good knowledge of Hadoop technology stack. One may note that this part if comparatively easier than the data science.

 

Data Science

Once done with data engineering phases, the data analysis phase starts in which some of the following technologies (frameworks) come very handy:

  • R programming language
  • Mahout
  • Pig
  • Hive
  • Java/Python libraries

The person working in data analysis phase need to be strong with following skills:

  • Machine learning algorithms
  • Mathematics & Statistics knowledge

This person can also be called as “Data Scientist” and is very much in demand as to find a person with above skills is a difficult task.

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Feature Selection vs Feature Extraction: Machine Learning

Last updated: 2nd May, 2024 The success of machine learning models often depends on the…

22 mins ago

Model Selection by Evaluating Bias & Variance: Example

When working on a machine learning project, one of the key challenges faced by data…

6 hours ago

Bias-Variance Trade-off in Machine Learning: Examples

Last updated: 1st May, 2024 The bias-variance trade-off is a fundamental concept in machine learning…

1 day ago

Mean Squared Error vs Cross Entropy Loss Function

Last updated: 1st May, 2024 As a data scientist, understanding the nuances of various cost…

1 day ago

Cross Entropy Loss Explained with Python Examples

Last updated: 1st May, 2024 In this post, you will learn the concepts related to…

1 day ago

Logistic Regression in Machine Learning: Python Example

Last updated: 26th April, 2024 In this blog post, we will discuss the logistic regression…

6 days ago