Apache HBase Interview Questions

Top 30 Apache HBase Question and Answers

1. What is Apache HBase?

Apache HBase is a "No-SQL", non-relational database framework which is used to store billions of rows and provides a real-time read and writes operation on that data. The footprint of the Apache HBase foundation is taken from the Google Bigtable. Technically we can call Apache HBase a data store instead of a database because it doesn't support the maximum features of a relational database management system such as secondary indexes, typed columns, triggers, and so on.

2. When to use Apache HBase?

  • HBase can be used in cases where we need random, real-time read/write access to our Big Data.
  • HBase is used to host very large tables with billions of rows and millions of columns on clusters of commodity hardware.

3. What are the key components of Apache HBase?

The key components of HBase are HBase Master, RegionServer, Zookeeper, Region, and Catalog Tables.

  • HMaster: It is similar to NameNode of Hadoop which manages Region Servers.
  • RegionServer: A table can be divided into several regions and those regions are served to the clients by a Region Server.
  • ZooKeeper: ZooKeeper acts as a coordinator inside HBase distributed system. It maintains the health of the server by communicating through sessions.
  • Region: It holds in a memory data store (MemStore) and Hfile.
  • Catalog Tables: It holds ROOT and META.

4. What is the data model of Apache HBase?

HBase data model contains below.

  • List of tables.
  • Every table has column families and rows.
  • Row key acts as a Primary key in the table.
  • HBase tables use this Primary Key for access.
  • Every column qualifier denotes an attribute that is corresponding to the object that presents in the cell.

5. What is WAL and Hlog in Hbase?

WAL stands for “Write Ahead Log” which is used to records all changes done in data. It is similar to MySQL BIN log and stores HLogkey’s. These keys consist of a sequential number as well as actual data which is used in case of a server failure to restore lost data.

6. What are the modes of Apache HBase in which it can run?

There are two modes of Apache HBase in which it can run.

  • Standalone Mode
  • Distributed Mode

7. What is column families in Apache HBase?

HBase column families are the collection of columns and row is a collection of column families.

8. What is RegionServer in Apache HBase?

When we create a table then it is divided into multiple regions and the region server is used to present a group of regions to the client.

9. What is decorating Filters in Apache HBase?

Decorating Filters is used to modify or extend the behavior of a filter so that it can gain additional control on the return value.

10. What are the data manipulation commands present in HBase?

The following is the list of database manipulation commands.

  • put: This command puts a row in a table.
  • get: This command fetches the row.
  • delete: This command is used to delete cell data.
  • deleteall: This command is used to delete all the cells in a given row.
  • scan: It is used to scan and return the table data.
  • count: It is used to count and return the number of rows in a table.
  • truncate: It is used to disable, drop, and recreate a specified table.

11. What is the use of ZooKeeper in HBase?

Zookeeper is used to provide distributed synchronization and maintain the configuration information and communication between region servers and clients.

12. What is the use of the catalog table in Apache HBase?

Catalog tables are used to main the metadata information of HBase.

13. What is the difference between RDBMS and HBase?

It uses tables as databases. It uses regions as databases.
File systems supported are FAT, NTFS, and EXT. The file system supported is HDFS.
To store logs, RDBMS uses the commit logs. Apache HBase uses the WAL(Write-Ahead Logs) logs to store logs.
The reference system used is a coordinate system. The reference system used is ZooKeeper.
Uses the primary key. Uses the row key.
Partitioning is supported. Sharding is supported.
The data model of RDBMS is rows and columns. The data model of HBase is rows, columns, column families, and cells.

14. What is block size configured in Apache HBase?

Block size is configured per column family and the default value is 64 KB which can be changed.

15. Which command is used to start HBase Shell?

From the HBase directory, we can run ./bin/hbase shell command to start HBase Shell.

16. What is LZO in Apache HBase?

LZO stands for Lempel-Ziv-Oberhumer (LZO) which is a lossless data compression algorithm that focuses on decompression speed.

17. What is HBaseFsck in Apache HBase?

Apache HBaseFsck tool is a maintenance tool that helps to repair a corrupted table in the region. We can operate the Apache HBaseFsck tool in two different ways. The first one checks the read consistency of the table and the second one checks both read and write consistency and repair it.

18. What are the features of Apache HBase?

Apache HBase provides the following sets of features that make it a solid non-relational database.

  • Apache HBase provides linear and modular scalability.
  • It provides read and writes operations consistently.
  • It provides the automatic facility to configure the table sharding.
  • Apache HBase RegionServres are automatically failover.
  • A client can interact with Apache HBase using the JAVA API.
  • It can be extensible for the JRuby-based (JIRB) shell as well.

19. What is the use of Apache HBase MasterServer?

The master server is used to assign a region to the region server. It also helps to balance the load.

20. What is the fundamental structure of Apache HBase?

Row key and column key are the fundamental structure of HBase.

21. What are the DDL commands supported by Apache HBase?

Below is the list of DDL commands which are supported by HBase.

  • Create
  • List
  • Describe
  • Disable
  • Disable_all
  • Show_filters
  • Drop
  • Drop_all
  • Is_enabled
  • Alter

22. What are the applications of HBase?

The applications of HBase are as below.

  • Medical: HBase is used for storing genome sequences and running MapReduce on it, storing the disease history of people or an area, and many others.
  • Sports: HBase is used for storing match histories for better analytics and prediction.
  • Web: Apache HBase is used in many organizations to see the user activity logs to target the right customer.
  • Oil and Petroleum: The Oil and petroleum industry is using the Apache HBase to explore the analysis to find out the places when oil can be.
  • e-commerce: The e-commerce companies are using Apache HBase to store users' activity logs and based on the analysis they are creating the advertisement to improve their business strategies.

23. What are some major advantages of Apache HBase?

The major advantages of Apache HBase are as below.

  • Apache HBase is great for analytics in association with Hadoop MapReduce.
  • Apache HBase can billion of rows and can process that as well.
  • It supports scaling out in coordination with the Hadoop file system even on commodity hardware.
  • HBase is Fault tolerance.
  • Apache foundation provides it License-free.
  • It is very flexible on schema design/no fixed schema.
  • HBase can be integrated with Hive for SQL-like queries, which is better for DBAs who are more familiar with SQL queries.
  • It provides Auto-sharding.
  • It provides the feature of auto-failover.
  • It provides a simple client interface.
  • HBase provides row-level atomicity, that is, the PUT operation will either write or fail.

24. What is compaction in Apache HBase?

Compaction is a process in which HBase tries to combine HFiles to reduce the maximum number of disk seeks needed for reading thus it provides an optimal read performance.

25. What is HColumnDescriptor class?

HColumnDescriptor class is used to store the detail regarding the column family such as compression settings, Number of versions.

26. What is a Bloom filter in HBase?

An HBase Bloom is used to improve the performance of the cluster. It is a space-efficient mechanism to test whether an HFile includes a certain row or row-col cell.

27. What do you understand by Apache Thrift?

Apache Thrift is used to providing schema compilers for multiple programming languages such as Java, C++, Perl, PHP, Python, Ruby, and more, and It is developed in the C++ programming language.

28. What is the use of Nagios?

Nagios tool is used to slow qualitative data regarding cluster status. It selects current stats regularly and compares them with given thresholds.

29. What is HBase Shell?

HBase shell is a java API that provides an interface to communicate with HBase.

30. What is the use of exists command?

Exists command is used to check if a table exists or not.