Categories: Big Data

HBase Architecture Components for Beginners

This blog represents high-level concepts on HBase architecture components. Following diagram represents the same:

HBase Architecture Components – Key Building Blocks

Following diagram represents the same:

Figure 1. HBase Architectural Building Blocks

Pay attention to some of the following in relation to above diagram: 

  • HMaster: Responsible for coordinating the region servers including assigning regions on startup as well as recovery, and, monitoring region servers using Zookeeper
  • Region Servers: Manages one or more regions
  • Zookeeper: Zookeeper is used as a distributed coordination service for maintaining the server state of the cluster.
  • Regions: Records in HBase tables are split horizontally based on the key range. Each of these splits can be called as Regions. A region contains all rows in the table between the region’s start key and end key. A region server can consist of as much as 1000 regions.
  • HFile: HFile represents the file which comprises of data written from MemStore. Hfile is written in HDFS. This essentially implies that multiple copies of HFile are maintained.
  • MemStore: MemStore is a write cache which is maintained in-memory. The data from WAL is written next to MemStore. MemStore consists of sorted key-value pairs. Once MemStore gets full, the data set is flushed into HFile.
  • WAL (Write ahead logs): Any writes to HBase database is first written to WAL.
  • BlockCache: Read cache of HBase. It stores frequently read data in memory. Least Recently Used data is evicted when full.

HBase – Different Kinds of Servers

Following are three different kinds of servers which form part of HBase setup:

  • HMaster server
  • A Set of region servers
  • Distributed Zookeeper servers (replica set of 3 or 5 servers with one master)

How is HBase related to HDFS?

HFile and WAL (write ahead logs) are written into HDFS data node. The following are some of the scenarios you could see the benefits of storing HFile and WAL as part of HDFS:

  • What if region server goes down without MemStore data being flushed? Given the fact that copies of WAL is maintained on at least two other nodes, the data in WAL is loaded on MemStore on other region servers and then flushed as HFile which then gets replicated on HDFS data node.
  • What is region server goes down with data written only in WAL? As like above, replica of WAL is loaded in MemStore on another region server which is then flushed as HFile and replicated appropriately.

How is HBase related to Zookeeper?

HBase needs Zookeeper primarily for coordination.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

1 month ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago