This blog represents high-level concepts on HBase architecture components. Following diagram represents the same:
HBase Architecture Components – Key Building Blocks
Following diagram represents the same:
Pay attention to some of the following in relation to above diagram:
- HMaster: Responsible for coordinating the region servers including assigning regions on startup as well as recovery, and, monitoring region servers using Zookeeper
- Region Servers: Manages one or more regions
- Zookeeper: Zookeeper is used as a distributed coordination service for maintaining the server state of the cluster.
- Regions: Records in HBase tables are split horizontally based on the key range. Each of these splits can be called as Regions. A region contains all rows in the table between the region’s start key and end key. A region server can consist of as much as 1000 regions.
- HFile: HFile represents the file which comprises of data written from MemStore. Hfile is written in HDFS. This essentially implies that multiple copies of HFile are maintained.
- MemStore: MemStore is a write cache which is maintained in-memory. The data from WAL is written next to MemStore. MemStore consists of sorted key-value pairs. Once MemStore gets full, the data set is flushed into HFile.
- WAL (Write ahead logs): Any writes to HBase database is first written to WAL.
- BlockCache: Read cache of HBase. It stores frequently read data in memory. Least Recently Used data is evicted when full.
HBase – Different Kinds of Servers
Following are three different kinds of servers which form part of HBase setup:
- HMaster server
- A Set of region servers
- Distributed Zookeeper servers (replica set of 3 or 5 servers with one master)
How is HBase related to HDFS?
HFile and WAL (write ahead logs) are written into HDFS data node. The following are some of the scenarios you could see the benefits of storing HFile and WAL as part of HDFS:
- What if region server goes down without MemStore data being flushed? Given the fact that copies of WAL is maintained on at least two other nodes, the data in WAL is loaded on MemStore on other region servers and then flushed as HFile which then gets replicated on HDFS data node.
- What is region server goes down with data written only in WAL? As like above, replica of WAL is loaded in MemStore on another region server which is then flushed as HFile and replicated appropriately.
How is HBase related to Zookeeper?
HBase needs Zookeeper primarily for coordination.