Apache Hive Project - 01

The objective of this tutorial is to create an Apache Hive project for the Sales-Count data set in which we will take the Sales-Count data set from the client system and then process it in the Hive system. After processing of data from the Hive system, we will push the data to the client system and after that, we will show data in a graphical format.

So readers ... are you ready to create an Apache Hive project. (-SalesCount-)


1. Idea Of Project

a.   Fetch raw data from the Client System. hive_project_setup01
b.   Process raw data through the Hive System. hive_project_setup02
c.   Push result data from the Hive System to the Client System. hive_project_setup03
d.   Show result data in the Client System through graph trends. hive_project_setup04

2. Building Of Project

To run this project you can install VM (Virtual Machine) on your local system and configured Hive, MySQL, Hadoop on that. After this configuration, your local system will work as a client system and Hive VM will work as a Hive system. Alternative you can take two systems which are communicating with each other and on one of the system Hive is configured.

Let us see this project in detail and run it using the below steps.


a. The Client System
It is an example of the Spring Boot JAVA Framework. When we will build this project then it will create a "client.jar" executable file.
It has java code, data files, and static HTML pages.
Java code has 2 files, SpringBootWebApplication.java and UploadDownloadController.java
SpringBootWebApplication.java is the main project file, which is responsible for building code and running it on an embedded server.
UploadDownloadController.java is used to provide download & upload URL HTTP services. For downloading data files it uses the download client URL and for uploading result files it uses the upload client URL.
data folder has unprocessed sales CSV log files. (data_1.csv, data_2.csv, data_3.csv ...)
The Static folder has chart HTML page code (index.html) and dependent js files (chart.min.js, utils.js). This is the main client view page which shows the Hive process result data.
pom.xml is a project build tool file. This file has java dependencies and builds configuration details.
For creating the “client.jar” file we use the command "mvn clean package".
Click Here To Download the "ClientSaleCode" project zip file.

b. Apache Hive System
Download HiveCreateTable.sh & SalesAnalysis.sh shell script files.
HiveCreateTable.sh, this shell script file has hive queries for creating a sales table.
SalesAnalysis.sh, this shell script file has a collection of hive quires for loading and selecting result data from the table.


3. Run The Project

a. Client System b. Hive System
1. Download client.jar in the client system.
Click Here To Download the "client.jar" executable jar file.
Check all configuration of Hadoop and MySQL for Hive in the Hive system.
2. Run client.jar in the client system. At execution time pass server port 9999. Here we can use a different port if the port already uses in the client system.
java -jar client.jar --server.port=9999
3. Check Client page on browser using url: http://localhost:9999 Download HiveCreateTable.sh & SalesAnalysis.sh script file on Hive System.
4. Find client system IP, which will be accessible to the Hive system. Execute HiveCreateTable.sh shell script in the Hive system to create sales table into Hive.
sh HiveCreateTable.sh
5. After successful execution of the create table script execute the next script file SalesAnalysis.sh. At the time of running the script, pass the client-ip address and client-port number.
sh SalesAnalysis.sh 192.168.225.49 9999
6. The client page will automatically show the result in the bar chart, after processing the query in the Hive system. Hive System get data file from Client System and load on Hive table. After loading data successfully execute the query on loaded data and send result values to the Client System. After sending the result start the next datafile execution.
7. The client system will automatically show all the results in the Bar chart once the Hive system will complete its processing. SalesAnalysis.sh script has the loop function so once the loop is finished, the execution of files is automatically ended.


4. Project Files Description In Details


(i). HiveCreateTable.sh



Using a shell script (HiveCreateTable.sh) we can easily create a Hive sales table in the Hive system.

HiveCreateTable.sh file have sql scripts.

First (line 2) deleted all existing files from the Hive System.

The second (line 5) creates a sales table which already defines from data files.

The third (line 8) query is used to check table is created without any error.


(ii). SalesAnalysis.sh



Using a shell script (SalesAnalysis.sh) we can easily load data files and execute the query on loaded contain.

SalesAnalysis.sh file execute hive queries in the loop for all 5 data files. For execution required to pass the two-variable value (Client-System-IP & Client-Service-Port).

Line 2 & 3, delete all existing data and result in files if exists in the Hive System.

Line 7, start loop for executing same queries with different data set.

Line 12, download data file from Client System according to the sequence number.

Line 15, load downloaded data file into hive table.

Line 18, select query to collect product count values according to the product name. After query execution save result values into a file.

Line 21, upload result file content to the Client System.

(sh SalesAnalysis.sh 192.168.225.49 9999): here "sh" is Linux command, "SalesAnalysis.sh" is a shell script file name, "192.168.225.49" is a first variable (Client-System-IP), "9999" is a second variable (Client-Service-Port).


:) ...enjoy the Hive project.