What is Apache HBase Architecture?
Apache HBase is a "NoSQL" database. "NoSQL" is a general representation that indicates that the database isn’t an RDBMS that supports SQL as its primary access language. Apache HBase is represented as Data Store rather than a database. HBase can scale linear as well as modular by adding commodity nodes in the cluster. If we are increasing the nodes from 20 to 40 then in the HBase cluster then the storage and the capacity also increases concurrently.
Let us discuss various components of HBase.
Apache ZooKeeper is a high-performance, centralized, multi coordination service system for distributed applications, which provides a distributed synchronization and group service to HBase. It directs the focus of users on the application logic despite cluster coordination. It also provides an API using which a user can coordinate with the Master server.
Apache ZooKeeper APIs provide consistency, ordering, and durability, it also provides synchronization and concurrency for a distributed clustered system.
Apache HBase HMaster is an important component of the HBase cluster that is responsible RegionServers monitoring, handling the failover, and managing region split.
HMaster functionalities are as below.
- It Monitors the RegionServers.
- It Handles RegionServers failover.
- It is used to handle metadata changes.
- It will assign/disallow regions.
- It provides an interface for all metadata changes.
- It is used to perform reload balancing in idle time.
- HMaster provides a web user interface that shows information about the HBase cluster.
RegionServers are responsible for storing the actual data. Just like in the Hadoop cluster, a NameNode stores metadata, and DataNode stores actual data similar way in HBase, mater holds the metadata, and RegionServers stores actual data. RegionServer runs on a DataNode in a distributed cluster environment.
RegionServer performs the following tasks.
- It handles the serving regions (tables) assigned to it.
- It Handles read and write requests performed by the client.
- It will flush the cache to HDFS.
- It is responsible for handling region splits.
- It maintains HLogs.
Components of a RegionServer
Let us see the components of RegionServer.
3.1 WAL(Write-Ahead logs)
Apache HBase WAL is an intermediate file also called an edit log file. When data is read or modified to HBase, it's not directly written in the disk rather it is kept in memory for some time but keeping data in memory could be dangerous because if the system goes down then all data would be erased so to overcome to this issue Apache HBase has a Write-Ahead logfile in which data will be written at first place and then on memory.
This is the actual file where row data is stored physically.
It corresponds to a column family for a table in HBase.Here the HFile is stored
This component resides in the main memory and records the current data operation so if data is stored in WAL then RegionServers stores key-value in the memory store.
Regions are the splits of a table which is divided based on the key and hosted by RegionServers.
The client can be written in Java or any other language and using external APIs to connect to RegionServer which is managing actual row data. Client query to catalog tables to find out the region and once the region is found, the client directly contacts RegionServers and performs the data operation and cached the data for fast retrieval.
5. Catalog Tables
Catalog Tables are used to maintain metadata for all RegionServers and regions.
There are two types of Catalog tables that exist in HBase.
- -ROOT- This table will have information about the location of the META table.
- .META This table contains information about all regions and their locations.