Categories: Big Data

HBase Architecture Components for Beginners

This blog represents high-level concepts on HBase architecture components. Following diagram represents the same:

HBase Architecture Components – Key Building Blocks

Following diagram represents the same:

Figure 1. HBase Architectural Building Blocks

Pay attention to some of the following in relation to above diagram: 

  • HMaster: Responsible for coordinating the region servers including assigning regions on startup as well as recovery, and, monitoring region servers using Zookeeper
  • Region Servers: Manages one or more regions
  • Zookeeper: Zookeeper is used as a distributed coordination service for maintaining the server state of the cluster.
  • Regions: Records in HBase tables are split horizontally based on the key range. Each of these splits can be called as Regions. A region contains all rows in the table between the region’s start key and end key. A region server can consist of as much as 1000 regions.
  • HFile: HFile represents the file which comprises of data written from MemStore. Hfile is written in HDFS. This essentially implies that multiple copies of HFile are maintained.
  • MemStore: MemStore is a write cache which is maintained in-memory. The data from WAL is written next to MemStore. MemStore consists of sorted key-value pairs. Once MemStore gets full, the data set is flushed into HFile.
  • WAL (Write ahead logs): Any writes to HBase database is first written to WAL.
  • BlockCache: Read cache of HBase. It stores frequently read data in memory. Least Recently Used data is evicted when full.

HBase – Different Kinds of Servers

Following are three different kinds of servers which form part of HBase setup:

  • HMaster server
  • A Set of region servers
  • Distributed Zookeeper servers (replica set of 3 or 5 servers with one master)

How is HBase related to HDFS?

HFile and WAL (write ahead logs) are written into HDFS data node. The following are some of the scenarios you could see the benefits of storing HFile and WAL as part of HDFS:

  • What if region server goes down without MemStore data being flushed? Given the fact that copies of WAL is maintained on at least two other nodes, the data in WAL is loaded on MemStore on other region servers and then flushed as HFile which then gets replicated on HDFS data node.
  • What is region server goes down with data written only in WAL? As like above, replica of WAL is loaded in MemStore on another region server which is then flushed as HFile and replicated appropriately.

How is HBase related to Zookeeper?

HBase needs Zookeeper primarily for coordination.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Coefficient of Variation in Regression Modelling: Example

When building a regression model or performing regression analysis to predict a target variable, understanding…

13 hours ago

Chunking Strategies for RAG with Examples

If you've built a "Naive" RAG pipeline, you've probably hit a wall. You've indexed your…

1 week ago

RAG Pipeline: 6 Steps for Creating Naive RAG App

If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…

1 week ago

Python: List Comprehension Explained with Examples

If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…

2 weeks ago

Large Language Models (LLMs): Four Critical Modeling Stages

Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…

3 months ago

Agentic Workflow Design Patterns Explained with Examples

As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…

3 months ago