Apache Pig is an execution engine that is used to process large data flow in parallel on Hadoop. Pig has two components the first component is the Pig Latin which is the language, and the second is the runtime environment where Pig Latin programs are executed. Pig Latin provides various operators such as sort, join, filter to perform various operations also provides the facility for the user to develop their custom functions for reading, processing, and writing data.

In Pig Latin a user can write a program to perform a particular task and execute those programs using Grunt Shell, UDFs, or Embedded execution mechanisms. Internally Pig framework converts the Pig Latin language into a MapReduce task and performs data processing.

The following is the architecture of Apache Pig.


pig architecture


Apache Pig Components

Apache Pig process Latin programs through multiple layers.

The following are the component of Apache Pig.

Parser

When a user submits a program, it is initially received by the parser, now the parser performs a syntax check and type checks. Once it performs this activity the output is created a DAG which contains Pig Latin statements and logical operators.

Optimizer

In the optimization phase, the DAG is pushed to a logical optimizer that carries out logical optimizations such as projection and pushdown.

Compiler

In the compiler phase, the complication of the optimized logical plan is done into small Mapreduce jobs and passed to the execution engine.

Execution Engine

This is the final step in which Mapreduce jobs are submitted to Hadoop for execution. After completion of execution, it sends the desired data to the user.


Apache Pig Mode

The following are three interaction modes allowed by Apache Pig.

1. Interactive Mode

In this mode of Apache Pig Interaction, the user uses Grunt shell to run Pig commands. The processing and complication are started when the user submits the STORE command.

2. Batch Mode

In this mode of Apache Pig interaction, a user runs a script file that contains some set of Pig command to perform some action.

3. Embedded Mode

This mode uses the Java library and the method invocation to run Apache Pig commands.