Diagnostic operators are used to verifying the execution flow, displaying data on the terminal, checking logical and physical execution plans of MapReduce, and so on.

Pig provides the below list of Diagnostic Operators.

  1. DUMP Operator
  2. DESCRIBE Operator
  3. EXPLAIN Operator
  4. ILLUSTRATE Operator

Let us see each Diagnostic operator in detail.


1. DUMP Operator

In Pig, the DUMP operator is used to projecting the result of Pig Latin on the terminal.

Syntax:
grunt> DUMP alias;

To perform the DUMP operation, first, we will load a file name “employee.txt” in HDFS “/pigexample/” location. The content of the file is mentioned below.

1001,James,Butt,New Orleans,Orleans 1002,Josephine,Darakjy,Brighton,Livingston 1003,Art,Venere,Bridgeport,Gloucester 1004,Lenna,Paprocki,Anchorage,Anchorage 1005,Donette,Foller,Hamilton,Butler 1006,Simona,Morasca,Ashland,Ashland 1007,Mitsue,Tollner,Chicago,Cook 1008,Leota,Dilliard,San Jose,Santa 1009,Sage,Wieser,Sioux Falls,Minnehaha

Command:
grunt> employees = LOAD '/pigexample/employee.txt' USING PigStorage(',') as (empid:int,firstname:chararray,lastname:chararray,city:chararray,county:chararray );

Once data is loaded in Pig, we will use the Dump operator to print the content of the “employee.txt” file.

Command:
grunt> DUMP employees;

A MapReduce job will start when we will submit the Dump command. It will read data from HDFS and show the below output.

Output:
verify output


2. DESCRIBE Operator

DESCRIBE operator is used to viewing the structure of a schema.

Syntax:
grunt> DESCRIBE alias;

We will use DESCRIBE operator to see the structure of schema “employees” which was created in the DUMP section.

Command:
grunt> DESCRIBE employees;

Output:
describe output


3. EXPLAIN Operator

EXPLAIN operator is used to reviewing the logical, physical, and MapReduce execution plans that are used to compute the specified relationship.

Syntax:
grunt> EXPLAIN alias;

We will use EXPLAIN operator to see the logical, physical, and MapReduce execution plans of schema “employees” which was created in the DUMP section.

Command:
grunt> EXPLAIN employees;

Output:
grunt> EXPLAIN employees; 2020-07-11 03:19:35,289 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2020-07-11 03:19:35,446 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} #----------------------------------------------- # New Logical Plan: #----------------------------------------------- employees: (Name: LOStore Schema: empid#11:int,firstname#12:chararray,lastname#13:chararray,city#14:chararray,county#15:chararray) | |---employees: (Name: LOForEach Schema: empid#11:int,firstname#12:chararray,lastname#13:chararray,city#14:chararray,county#15:chararray) | | | (Name: LOGenerate[false,false,false,false,false] Schema: empid#11:int,firstname#12:chararray,lastname#13:chararray,city#14:chararray,county#15:chararray)ColumnPrune:OutputUids=[11, 12, 13, 14, 15]ColumnPrune:InputUids=[11, 12, 13, 14, 15] | | | | | (Name: Cast Type: int Uid: 11) | | | | | |---empid:(Name: Project Type: bytearray Uid: 11 Input: 0 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 12) | | | | | |---firstname:(Name: Project Type: bytearray Uid: 12 Input: 1 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 13) | | | | | |---lastname:(Name: Project Type: bytearray Uid: 13 Input: 2 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 14) | | | | | |---city:(Name: Project Type: bytearray Uid: 14 Input: 3 Column: (*)) | | | | | (Name: Cast Type: chararray Uid: 15) | | | | | |---county:(Name: Project Type: bytearray Uid: 15 Input: 4 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: empid#11:bytearray) | | | |---(Name: LOInnerLoad[1] Schema: firstname#12:bytearray) | | | |---(Name: LOInnerLoad[2] Schema: lastname#13:bytearray) | | | |---(Name: LOInnerLoad[3] Schema: city#14:bytearray) | | | |---(Name: LOInnerLoad[4] Schema: county#15:bytearray) | |---employees: (Name: LOLoad Schema: empid#11:bytearray,firstname#12:bytearray,lastname#13:bytearray,city#14:bytearray,county#15:bytearray)RequiredFields:null 2020-07-11 03:19:35,804 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 #----------------------------------------------- # Physical Plan: #----------------------------------------------- employees: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17 | |---employees: New For Each(false,false,false,false,false)[bag] - scope-16 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[chararray] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[chararray] - scope-11 | | | |---Project[bytearray][3] - scope-10 | | | Cast[chararray] - scope-14 | | | |---Project[bytearray][4] - scope-13 | |---employees: Load(/pigexample/employee.txt:PigStorage(',')) - scope-0 2020-07-11 03:19:36,167 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2020-07-11 03:19:36,298 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2020-07-11 03:19:36,299 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-18 Map Plan employees: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-17 | |---employees: New For Each(false,false,false,false,false)[bag] - scope-16 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[chararray] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[chararray] - scope-11 | | | |---Project[bytearray][3] - scope-10 | | | Cast[chararray] - scope-14 | | | |---Project[bytearray][4] - scope-13 | |---employees: Load(/pigexample/employee.txt:PigStorage(',')) - scope-0-------- Global sort: false ----------------


4. ILLUSTRATE Operator

ILLUSTRATE operator is used to displaying step-by-step execution of a sequence of Pig Latin statements.

Syntax:
grunt> ILLUSTRATE alias;

We will use ILLUSTRATE operator to see the step-by-step execution of a sequence of schema “employees” which was created in the DUMP section.

Command:
grunt> ILLUSTRATE employees;

Output:

The following output is shown if we use ILLUSTRATE employees command.

2020-07-11 03:28:11,993 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 2020-07-11 03:28:11,994 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2020-07-11 03:28:11,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: employees[1,12] C: R: --------------------------------------------------------------------------------------------------------------------------- | employees | empid:int | firstname:chararray | lastname:chararray | city:chararray | county:chararray | --------------------------------------------------------------------------------------------------------------------------- | | 1008 | Leota | Dilliard | San Jose | Santa | ---------------------------------------------------------------------------------------------------------------------------

Again if we run “ILLUSTRATE employees;” we get another output because it is displaying execution step by step.

2020-07-11 03:28:19,644 [main] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 2020-07-11 03:28:19,655 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2020-07-11 03:28:19,656 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: employees[1,12] C: R: --------------------------------------------------------------------------------------------------------------------------- | employees | empid:int | firstname:chararray | lastname:chararray | city:chararray | county:chararray | --------------------------------------------------------------------------------------------------------------------------- | | 1014 | Yuki | Whobrey | Rockford | Winnebago | ---------------------------------------------------------------------------------------------------------------------------