Apache Drill Querying Text Files

Apache Drill supports and accesses the Plain Text Files and the structure file types such as text files, flat files. In this tutorial, we will create the following file types and will access their data in drill also based on those files we will create the table in Drill.

The Drill-supported text file format is as mentioned below.

CSV(comma-separated values) file format
TSV(tab-separated values) file format
PSV(pipe-separated values) file format

Let us see each File format in detail in the below section.

1. CSV Files Format

We will create a CSV file with the name "employee_information.csv" under "/home/cloudduggu/drill/sample-data" directory and put the below content in that.


    
        153989,Jack,M,5/19/1995,01-05-2017,Clark,89170
        386158,Melissa,F,2/24/1972,11/24/2015,Washington,45767
        301576,Wayne,M,6/26/1996,7/19/2017,Cavalier,58255
        441771,Cheryl,F,2/23/1958,11/26/1990,Somerset,15555
        528509,Paula,F,8/22/1966,8/22/1994,Duplin,28518
        912990,Joshua,M,5/18/1970,09-02-2002,Madison,62074
        214352,Theresa,F,12-05-1992,1/19/2015,Mitchell,50481
        890290,Julia,F,7/15/1959,2/23/2005,Hickman,38476
        622406,Thomas,M,10-04-1967,06-07-1998,Cape Girardeau,63745
        979607,Carol,F,12/14/1994,4/25/2016,Caddo,71162
        969580,Matthew,M,10/26/1993,02-07-2016,Northumberland,22473
        426038, Joan,F,11/20/1972,12/30/2009,Meigs,45760
        388642,Ruby,F,05-01-1980,8/30/2013,Cannon,37190
        560455,Carolyn,F,03-10-1958,07-03-2001,Osceola,34771

Command:

We have used a nano editor to create the "employee_information.csv" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information.csv

Output:

1.1 Query CSV Files Format

Now let's see the records of "employee_information.csv" by querying them from Drill command-line interface.

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;

Output:

1.2 Query CSV Files with Column[n] Syntax

We can query the CSV file in Drill using the Column[n] Syntax in which the value starts from 0 and goes till n-1. 0 represents the first column. For example, we are fetching all column values of the "employee_information.csv" file and the query would be as below.

Command:

apache drill> SELECT columns[0],columns[1],columns[2],columns[3],columns[4],columns[5],columns[6] FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`

Output:

1.3 Query CSV Files with Column Name

We can convert Column[n] into a meaningful name also called alias as well using the "AS" clause.

Command:

apache drill> SELECT columns[0] as EMP_ID,columns[1] as EMP_NAME,columns[2] as Gender,columns[3] as Date_of_Birth,columns[4] as Date_of_Joining,columns[5] as County,columns[6] as Zip FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;

Output:

drill query using column name cloudduggu

1.4 Create table using CSV Data file

We can create a table name "EMP_CSV_DETAIL" using the CSV data file "employee_information.csv". The below command will be used to create the table. Before creating a table we will use the "dfs.tmp" schema as mentioned below.

Command:

apache drill> show databases;

apache drill> use dfs.tmp;

apache drill (dfs.tmp)> CREATE TABLE EMP_CSV_DETAIL as SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information.csv`;

apache drill (dfs.tmp)> select * from EMP_CSV_DETAIL;

Output:

drill create table as command cloudduggu

2. TSV Files Format

We will create a TSV file with the name "employee_information_tsv.tsv" under "/home/cloudduggu/drill/sample-data" directory and fetch the records using Drill query.

The content of the TSV file is as mentioned below.


    
        969580 Matthew M
        426038 Joan F
        388642 Ruby F
        560455 Carolyn F
        477253 Anne F
        263480 Daniel M
        683826 Roger M
        474599 Maria F
        335732 Brenda F
        329752 Lillian F
        893212 Amy F
        621833 Gregory M
        456747 Roy M
        278556 Richard M

Command:

We have used a nano editor to create the "employee_information_tsv.tsv" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information_tsv.tsv

Output:

2.1 Query TSV Files Format

We will run the below select query to fetch the data that is present in the "employee_information_tsv.tsv" file.

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information_tsv.tsv`;

Output:

3. PSV Files Format

We will create a PSV file with the name "employee_information_psv.tbl" and fetch the data using the Drill select query.

The content of the PSV file is as mentioned below.


    
        333476|Mary|F
        218791|Aaron|M
        134841|Donna|F
        726264|Carl|M
        766610|Joyce|F
        923947|Mary|F
        227922|Amanda|F
        265813|Jack|M
        247137|Alan|M
        407061|Elizabeth|F
        226714|Nancy|F

Command:

We have used a nano editor to create the "employee_information_psv.tbl" file and put the above records. To save the file press CTRL+O and to exit from the file press CTRL+x.

cloudduggu@ubuntu:~/drill/sample-data$ nano employee_information_psv.tbl

Output:

3.1 Query PSV Files Format

Let's run the below select query to fetch the records of "employee_information_psv.tbl".

Command:

apache drill> SELECT * FROM dfs.`/home/cloudduggu/drill/sample-data/employee_information_psv.tbl`;

Output:

Apache Drill Querying Text Files

Drill - Querying JSON Data

Drill - Querying Parquet File

1. CSV Files Format

Command:

Output:

1.1 Query CSV Files Format

Command:

Output:

1.2 Query CSV Files with Column[n] Syntax

Command:

Output:

1.3 Query CSV Files with Column Name

Command:

Output:

1.4 Create table using CSV Data file

Command:

Output:

2. TSV Files Format

Command:

Output:

2.1 Query TSV Files Format

Command:

Output:

3. PSV Files Format

Command:

Output:

3.1 Query PSV Files Format

Command:

Output: