Apache HBase Performance Tuning

Apache HBase usage is becoming more real-time-based and with this, it must perform a consist read and write operations. In doing so, we need to make sure that all distributed components perform at a standard that is executable on the Linux-based OS.

The distributed components are as below.

  • Hardware
  • OS (Linux, REH6.5)
  • JVM/Java 1.7.0_10 and above
  • HDFS
  • Read/write on the OS
  • Region servers
  • Master servers/HMaster
  • Client servers/HClient
  • Zookeeper
  • Table design
  • Large-scale data compression

Let us see the components that can be reconfigured to optimize the throughput both from hardware and software perspectives.

  1. Infrastructure/Operating systems
  2. Java virtual machine

1. Infrastructure/Operating systems

Designing HBase Infrastructure is an important task for optimization and performance because as data grows exponentially it starts bringing in operational challenges. In addition to this, the infrastructure should perform to the predefined performance criteria as the cluster changes from small to medium and from medium to large.

Apache HBase operates in a master-slave fashion, in which a master contains HDFS NameNode, MapReduce JobTracker, and HBase Master, and the slave contains HDFS DataNode, the MapReduce TaskTrackers, and the HBase region servers. Due to frequent decommissioning and maintenance of the slaves, it is recommended to separate the slave nodes from the master nodes.

Balanced Workload

In a production environment, various resources are distributed uniformly across various job types, which are CPU, disk I/O, or Network I/O intensive.

Compute Intensive

The I/O intensive job will require more power to perform its task. Slave nodes require more RAM and CPU to store the heap data while processing.

I/O Intensive

MapReduce jobs require more I/O if there is cold data to process. Hadoop clusters loading this cold data and trying to get this to a hot zone take up a lot of I/O.

Server Selection

For entry-level servers are a good choice for medium and large size of clusters because these servers provide good parallelization capabilities.

Memory Size

To avoid memory swapping, it is recommended to provide a good amount of RAM so that it can optimize processors because when swapping starts, that means RAM is full and the system is trying to push inactive files to the swap allocated space on the physical disk and this requires high I/O.


It is recommended to use four-socket medium clock speed processors. If cluster size large then 2-octo-core will be a better choice and for slaves four quad-cores per machine for zookeeper.


The network plays the biggest performance hit in HBase when it comes to switching hardware that is serving traffic and if we are increasing the capacity of the cluster then there can be various situations arise as mentioned below.

  • HBase hotspot regions.
  • Slowness due to high processor usage.
  • Slowness due to network starvation.

Operating System

Apache HBase is designed to run on Unix/Linux based system. Apache foundation does not recommend running HBase on windows based system as it is not production-tested.

Below is the list of Linux/Unix systems that are recommended.

  • CentOS
  • Fedora
  • Debian
  • Ubuntu
  • Red Hat Enterprise
  • Some version of Solaris

2. Java Virtual Machine

Performance tuning for JVM is a key area as it is tricky and depends on multiple factors. The factors which directly impact HBase performance is to optimize JVM so that the garbage collection threads don't trigger stop the world garbage collections and does the GC efficiently even at varying load.

There are various design choices to understand the Garbage collection process such as serial versus parallel, concurrent versus stop the world, compaction versus non-compaction vs copying. There are multiple performance metrics as well which should be calculated to get optimum performance such as GC overheads, pause time, frequency of collections, footprint, and promptness.

From the HBase performance perspective, we can think of two main memory objects BlockCache and Memstore. BlockCache is a representation of LruBlockCache that helps HBase to use large byte arrays. The garbage collection process is used to remove non-referenced objects and reallocate the memory process.

For result optimization, we can fine-tune the below parameters.

Parameter For Tuning Reasons For Tuning Recommendations
-XX:+UseG1GC Use of G1 Garbage collections. Use of G1 Garbage collections.
--XX:G1HeapRegionSize=n This parameter is used to sets the size of individual regions. The default value is decided ergonomically based on the heap size, The minimum is 1 MB and the maximum is 32 MB. In this case, as we have taken -Xms32g and –Xmx32g, we can take 32 MB.
-XX:InitiatingHeapOccupancyPercent=n The percentage of heap occupied to initiate the concurrency GC process. The zero value indicates constant a GC process, the default value is 45. We should keep this as a default of 45 levels.
XX:MaxGCPauseMillis=n This parameter is used to sets a soft goal for the JVM and JVM and make the best effort to honor this goal. For better performance, you have to do a series of tests, and based on this, you should finalize it; generally, 100 ms is good enough.
-XX:+ParallelRefProcEnabled If it's turned on, GC uses multiple threads to process the references during Young and mixed GC. For HBase, the GC remarking time is reduced to 75%, and the overall GC pause time is reduced to 30%.
-XX:-ResizePLAB and -XX:ParallelGCThreads PLAB- Promotion local allocation buffer is used during young collections where multithread env are in play. PLAB makes sure the thread doesn't compete with a data structure that shares the same memory space. Say you have 16 logical processors. Thus the value should be 8+(16-8)*(5/8)=13
-XX:G1NewSizePercent This is the percentage of the heap to be used as the minimum for young generation size; the default value is 5. Keeping it below 3 may give better performance. In various test scenarios, the lower value of 1 and 2 gave better performance.