Categories: Big Data

HBase Architecture Components for Beginners

This blog represents high-level concepts on HBase architecture components. Following diagram represents the same:

HBase Architecture Components – Key Building Blocks

Following diagram represents the same:

Figure 1. HBase Architectural Building Blocks

Pay attention to some of the following in relation to above diagram: 

  • HMaster: Responsible for coordinating the region servers including assigning regions on startup as well as recovery, and, monitoring region servers using Zookeeper
  • Region Servers: Manages one or more regions
  • Zookeeper: Zookeeper is used as a distributed coordination service for maintaining the server state of the cluster.
  • Regions: Records in HBase tables are split horizontally based on the key range. Each of these splits can be called as Regions. A region contains all rows in the table between the region’s start key and end key. A region server can consist of as much as 1000 regions.
  • HFile: HFile represents the file which comprises of data written from MemStore. Hfile is written in HDFS. This essentially implies that multiple copies of HFile are maintained.
  • MemStore: MemStore is a write cache which is maintained in-memory. The data from WAL is written next to MemStore. MemStore consists of sorted key-value pairs. Once MemStore gets full, the data set is flushed into HFile.
  • WAL (Write ahead logs): Any writes to HBase database is first written to WAL.
  • BlockCache: Read cache of HBase. It stores frequently read data in memory. Least Recently Used data is evicted when full.

HBase – Different Kinds of Servers

Following are three different kinds of servers which form part of HBase setup:

  • HMaster server
  • A Set of region servers
  • Distributed Zookeeper servers (replica set of 3 or 5 servers with one master)

How is HBase related to HDFS?

HFile and WAL (write ahead logs) are written into HDFS data node. The following are some of the scenarios you could see the benefits of storing HFile and WAL as part of HDFS:

  • What if region server goes down without MemStore data being flushed? Given the fact that copies of WAL is maintained on at least two other nodes, the data in WAL is loaded on MemStore on other region servers and then flushed as HFile which then gets replicated on HDFS data node.
  • What is region server goes down with data written only in WAL? As like above, replica of WAL is loaded in MemStore on another region server which is then flushed as HFile and replicated appropriately.

How is HBase related to Zookeeper?

HBase needs Zookeeper primarily for coordination.

Latest posts by Ajitesh Kumar (see all)
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

What are AI Agents? How do they work?

Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…

2 weeks ago

Agentic AI Design Patterns Examples

In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…

2 weeks ago

List of Agentic AI Resources, Papers, Courses

In this blog, I aim to provide a comprehensive list of valuable resources for learning…

2 weeks ago

Understanding FAR, FRR, and EER in Auth Systems

Have you ever wondered how systems determine whether to grant or deny access, and how…

3 weeks ago

Top 10 Gartner Technology Trends for 2025

What revolutionary technologies and industries will define the future of business in 2025? As we…

3 weeks ago

OpenAI GPT Models in 2024: What’s in it for Data Scientists

For data scientists and machine learning researchers, 2024 has been a landmark year in AI…

3 weeks ago