Apache Drill supports and accesses the Plain Text Files and the structure file types such as text files, flat files. In this tutorial, we will create the following file types and will access their data in drill also based on those files we will create the table in Drill.

The Drill-supported text file format is as mentioned below.

Let us see each File format in detail in the below section.


1. CSV Files Format

We will create a CSV file with the name "employee_information.csv" under "/home/cloudduggu/drill/sample-data" directory and put the below content in that.

153989,Jack,M,5/19/1995,01-05-2017,Clark,89170 386158,Melissa,F,2/24/1972,11/24/2015,Washington,45767 301576,Wayne,M,6/26/1996,7/19/2017,Cavalier,58255 441771,Cheryl,F,2/23/1958,11/26/1990,Somerset,15555 528509,Paula,F,8/22/1966,8/22/1994,Duplin,28518 912990,Joshua,M,5/18/1970,09-02-2002,Madison,62074 214352,Theresa,F,12-05-1992,1/19/2015,Mitchell,50481 890290,Julia,F,7/15/1959,2/23/2005,Hickman,38476 622406,Thomas,M,10-04-1967,06-07-1998,Cape Girardeau,63745 979607,Carol,F,12/14/1994,4/25/2016,Caddo,71162 969580,Matthew,M,10/26/1993,02-07-2016,Northumberland,22473 426038, Joan,F,11/20/1972,12/30/2009,Meigs,45760 388642,Ruby,F,05-01-1980,8/30/2013,Cannon,37190 560455,Carolyn,F,03-10-1958,07-03-2001,Osceola,34771

Command:

We have used a nano editor to create the "employee_information.csv" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information.csv

Output:

drill csv file creation cloudduggu

drill csv file content cloudduggu


1.1 Query CSV Files Format

Now let's see the records of "employee_information.csv" by querying them from Drill command-line interface.

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;

Output:

drill query csv file cloudduggu


1.2 Query CSV Files with Column[n] Syntax

We can query the CSV file in Drill using the Column[n] Syntax in which the value starts from 0 and goes till n-1. 0 represents the first column. For example, we are fetching all column values of the "employee_information.csv" file and the query would be as below.

Command:

apache drill> SELECT columns[0],columns[1],columns[2],columns[3],columns[4],columns[5],columns[6] FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`

Output:

drill query all column cloudduggu


1.3 Query CSV Files with Column Name

We can convert Column[n] into a meaningful name also called alias as well using the "AS" clause.

Command:

apache drill> SELECT columns[0] as EMP_ID,columns[1] as EMP_NAME,columns[2] as Gender,columns[3] as Date_of_Birth,columns[4] as Date_of_Joining,columns[5] as County,columns[6] as Zip FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;

Output:

drill query using column name cloudduggu


1.4 Create table using CSV Data file

We can create a table name "EMP_CSV_DETAIL" using the CSV data file "employee_information.csv". The below command will be used to create the table. Before creating a table we will use the "dfs.tmp" schema as mentioned below.

Command:

apache drill> show databases;
apache drill> use dfs.tmp;
apache drill (dfs.tmp)> CREATE TABLE EMP_CSV_DETAIL as SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;
apache drill (dfs.tmp)> select * from EMP_CSV_DETAIL;

Output:

drill create table as command cloudduggu

drill show table command cloudduggu


2. TSV Files Format

We will create a TSV file with the name "employee_information_tsv.tsv" under "/home/cloudduggu/drill/sample-data" directory and fetch the records using Drill query.

The content of the TSV file is as mentioned below.

969580 Matthew M 426038 Joan F 388642 Ruby F 560455 Carolyn F 477253 Anne F 263480 Daniel M 683826 Roger M 474599 Maria F 335732 Brenda F 329752 Lillian F 893212 Amy F 621833 Gregory M 456747 Roy M 278556 Richard M

Command:

We have used a nano editor to create the "employee_information_tsv.tsv" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information_tsv.tsv


Output:

drill create tsv file cloudduggu

drill tsv cloudduggu


2.1 Query TSV Files Format

We will run the below select query to fetch the data that is present in the "employee_information_tsv.tsv" file.

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information_tsv.tsv`;


Output:

drill query tsv cloudduggu


3. PSV Files Format

We will create a PSV file with the name "employee_information_psv.tbl" and fetch the data using the Drill select query.

The content of the PSV file is as mentioned below.

333476|Mary|F 218791|Aaron|M 134841|Donna|F 726264|Carl|M 766610|Joyce|F 923947|Mary|F 227922|Amanda|F 265813|Jack|M 247137|Alan|M 407061|Elizabeth|F 226714|Nancy|F

Command:

We have used a nano editor to create the "employee_information_psv.tbl" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information_psv.tbl


Output:

drill create PSV file cloudduggu


3.1 Query PSV Files Format

Let's run the below select query to fetch the records of "employee_information_psv.tbl".

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information_psv.tbl`;


Output:

drill query psv file cloudduggu