Apache Drill queries can read JSON data format easily. Drill considers every object of JSON data as a row in the Drill table. There is a sample JSON file that comes with the Drill installation package that we can access using cp and see how it is behaving.

Query JSON File in Apache Drill

We can use the sample file "employee.json" that comes with the default package of JAR file foodmart-data-json-0.4.jar. The location of this JAR file is "/home/cloudduggu/drill/jars/3rdparty/foodmart-data-json-0.4.jar". To access "employee.json" file we will use cp.

Let's start the Drill shell in embedded mode and run the below query to fetch 5 records from "employee.json".

Command:

cloudduggu@ubuntu:~/drill$ ./bin/drill-embedded
apache drill> select * from cp.`employee.json` limit 5;

Output:

drill select query cloudduggu

We can run the same query in Drill Web UI and get the same result as mentioned below.

drill web UI select query cloudduggu

drill select output cloudduggu


Create JSON File in Apache Drill

Let's create a new JSON file with the name "emp_detail.json". This file will contain the following JSON records and stored at the location "/home/cloudduggu/drill/sample-data/emp_detail.json".

{ "EMP_ID" : "677509", "emp_name" : "Lois", "emp_age" : 36, "gender" : "female", "Date of Joining" : "11/24/2003", "county" : "Denver", "pincode" : 80224 } { "EMP_ID" : "940761", "emp_name" : "Brenda", "emp_age" : 40, "gender" : "female", "Date of Joining" : "7/27/2008", "county" : "De Soto", "pincode" : 71078 } { "EMP_ID" : "428945", "emp_name" : "Joe", "emp_age" : 34, "gender" : "male", "Date of Joining" : "08/03/2016", "county" : "Clinton", "pincode" : 46057 } { "EMP_ID" : "193819", "emp_name" : "Benjamin", "emp_age" : 45, "gender" : "male", "Date of Joining" : "7/25/2013", "county" : "Waupaca", "pincode" : 54940 } { "EMP_ID" : "408351", "emp_name" : "Diane", "emp_age" : 45, "gender" : "female", "Date of Joining" : "4/16/1999", "county" : "Crawford", "pincode" : 16328 } { "EMP_ID" : "499687", "emp_name" : "Patrick", "emp_age" : 50, "gender" : "male", "Date of Joining" : "7/22/2005", "county" : "Madison", "pincode" : 50155 } { "EMP_ID" : "539712", "emp_name" : "Nancy", "emp_age" : 45, "gender" : "female", "Date of Joining" : "9/14/2016", "county" : "Fulton", "pincode" : 30334 } { "EMP_ID" : "380086", "emp_name" : "Carol", "emp_age" : 40, "gender" : "female", "Date of Joining" : "1/28/1983", "county" : "Clinton", "pincode" : 45107 } { "EMP_ID" : "477616", "emp_name" : "Frances", "emp_age" : 33, "gender" : "female", "Date of Joining" : "4/27/1994", "county" : "Starr", "pincode" : 78536 } { "EMP_ID" : "162402", "emp_name" : "Diana", "emp_age" : 43, "gender" : "female", "Date of Joining" : "2/17/2014", "county" : "Carroll", "pincode" : 72632 }

Command:

We have used the nano editor to create the JSON file and put the above records in the file.

cloudduggu@ubuntu:~/drill/sample-data$ nano emp_detail.json

Output:

drill create json file cloudduggu


Read JSON File in Apache Drill

Now we will query the "emp_detail.json" file using the default storage plugin dfs.

Command:

apache drill>SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/emp_detail.json`

Output:

drill query empdataset cloudduggu

We can run the same query in Drill Web UI and as mentioned below.

drill run query in drill web ui cloudduggu

drill query output of web ui cloudduggu