Apache Drill Introduction

What is Apache Drill?

Apache Drill is a schema-less SQL query engine that is used to process and analyze large-scale datasets. The inspiration for Drill is taken from Google'Dremel which is a Google query engine used to process large data sets. The Drill is capable of handling the petabytes of data that are spread over a cluster of nodes. It processes Adhoc requests with low latency. Drill supports a variety of file systems such as Hadoop HDFS, HBase, MapR-DB, Amazon S3, Google Cloud Storage, Alluxio, Azure Blob storage, and NoSQL databases such as MongoDB.


Why Apache Drill?

The following are some of the strong reasons to use Apache Drill.

1. Setup in Few Minutes and Start Working

Apache Drill can be set in few minutes, it is just that need to untar the drill setup file and Linux, Windows, or the Mac system and start the Drill. There is no additional setup is required in terms of infrastructure and no need to set up schema as well.

2. Schema-Free JSON Model

Drill SQL engine does not require a schema, it is capable of understanding the structure of data automatically. It follows the same schema-free JSON model which Elasticsearch and MongoDB follow.

3. Support Real SQL

Apache Drill supports the SQL language standard SQL:2003 syntax so for a user who already knows SQL is easy to query data using the drill. It also provides the supports for data types such as VARCHAR, TIMESTAMP, INTERVAL, DATE, and DECIMAL, and joining supports in where clause.

4. Supports Standard BI Tools

Apache Drill provides the JDBC and ODBC drivers to connect the standard BI tools such as MicroStrategy, Tableau, Spotfire, Qlik, SAS, excel for fetching data from non-relational datastores.

5. Query complex, semi-structured data in-situ

Using Apache Drill, we can work with complex and semi-structured data in-situ because it uses a schema-free JSON model. Also, we need to note that before processing, there is no transformation of data is required.

6. Access Various Types of Data Sources

The Drill is designed in such a way that it can with any other data source using its storage plugin such as Hadoop HDFS, Amazon S3, MapR-FS. Hive, HBase, etc. We can combine the data in a SQL query on the fly from different data stores.

7. Scale from Single to 1000 Nodes Cluster

Apache Drill can be easily installed on a single computer and start working in an embedded mode also it can be easily scaled to the cluster of commodity hardware and provides the best performance. Drill uses the optimistic pipelined model to aggregate cluster nodes memory for better SQL query execution and if a working data is not fitting in memory then it automatically scatters to disk.

8. Support For User-Defined Functions(UDFs)

Apache Drill supports the custom users-defined UDFs by providing a Java API. We can create our UDFs and use them in Drill. If there is a UDF created in Hive then that also can be used in Apache Drill.

9. Queries on Hive Tables

Apache Drill can access the data stored in Hive and run the query on those data. We can easily join the Hive table with HBase or other log files and fetch data.

10. High performance

Apache Drill is developed in such a way(based on schema-free JSON model) that it provides unparalleled flexibility and the best performance. Drill does not use any execution engine such as Spark, MapReduce, or Tez. It uses cost-based and rule-based optimization techniques for better execution. Drill efficiently uses the memory and CPU by using its vectorized execution and columnar engine.


Apache Drill Key Features

The following are some of the important features of Apache Drill.

  • Apache Drill Model is based on a schema-free JSON format that is very similar to Elastic search and MongoDB.
  • It supports industry-rich API standards such as ODBC/JDBC, ANSI SQL, RESTful APIs, etc.
  • The pluggable architecture of Apache Drill opens the door for other datastore systems to connect with Drill.
  • Apache Drill can easily scale from one system to 1000 systems and process the request in distributed and optimize way.
  • Apache Drill is a columnar execution engine that processes complex and schema-free data and for that Drill uses the columnar data representation.
  • Apache Drill uses multiple compilers and the ASM-based bytecode rewriting to check the query and optimize it from the best performance.
  • The Drill follows the pipeline execution method and processes data in memory and avoids using the disk unless it is required.

The below figure represents the features of Apache Drill.

drill features cloudduggu


Apache Drill Version Releases

The following are the date-wise Apache Drill release.

Sr No Apache Drill Release Month & Date
1 Drill 1.0 Released May 2015
2 Drill 1.1 Released July 2015
3 Drill 1.2 Released October 2015
4 Drill 1.3 Released November 2015
5 Drill 1.4 Released December 2015
6 Drill 1.5 Released February 2016
7 Drill 1.6 Released March 2016
8 Drill 1.7 Released June 2016
9 Drill 1.8 Released August 2016
10 Drill 1.9 Released November 2016
11 Drill 1.10 Released March 2017
12 Drill 1.11 Released July 2017
13 Drill 1.12 Released December 2017
14 Drill 1.13 Released March 2018
15 Drill 1.14 Released August 2018
16 Drill 1.15 Released December 2018
17 Drill 1.16 Released May 2019
18 Drill 1.17 Released December 2019
19 Drill 1.18 Released September 2020

Comparison Between Drill, Hive, and Impala

The below is the list of comparisons between Apache Drill, Hive, and Impala.

Parameters Apache Drill Apache Hive Apache Impala
Latency  Low Medium Low
Files Support All Hive File Formats and JSON, Text file, etc. All Hive File Formats. Parquet, Sequence
HBase/M7 Support Yes Yes, But the performance issue is there. Yes, But with an issue.
Schema  Hive or Schema Less Hive Hive
SQL Support ANSI SQL HiveQL HiveQL
Client Support ODBC/JDBC ODBC/JDBC ODBC/JDBC
Hive Compat  High High Low
Large Dataset Support Yes Yes Limited
Nested Data Support Yes Limited No
Concurrency  High Limited Medium

Why Apache Drill is Successful?

The are multiple strong reasons behind the success of Apache Drill and the below figure are representing some of them.

drill main features cloudduggu