Apache Pig Tuple, Bag, Map Built-In Functions

Tuple,Bag,Map Functions

Apache Pig supports various types of Tuple, Bag, Map Functions such as TOBAG, TOP, TOTUPLE, and TOMAP to perform a different type of operation.

The following is the list of Tuple, Bag, Map functions supported by Apache Pig.

Sr No Functions Description
1 TOBAG() This function is used to convert two or more expressions into a bag.
2 TOP() This function is used to get the top N tuples of a relation.
3 TOTUPLE() This function is used to convert one or more expressions into a tuple.
4 TOMAP() This function is used to convert the key-value pairs into a Map.

Let us see a couple of examples.


TOBAG()

TOBAG function is used to convert one or more expressions to individual tuples which are then placed in a bag.

Syntax:
grunt> TOBAG(expression [, expression ...])

To perform this operation we have used the “studentdata.txt” dataset. We will put “studentdata.txt” in the HDFS location “/pigexample/” from the local file system. Content of “studentd

Content of “studentdata.txt”:

1,Chanel,Shawnee,KS,39 2,Ezekiel,Easton,MD,37 3,Willow,New York,NY,40 4,Bernardo,Conroe,TX,38 5,Ammie,Columbus,OH,38 6,Francine,Las Cruces,NM,38 7,Ernie,Ridgefield Park,NJ,38 8,Albina,Dunellen,NJ,56 9,Alishia,New York,NY,34 10,Solange,Metairie,LA,54

We will load “studentdata.txt” from the local filesystem into HDFS “/pigexample/” using the below commands.

Command:
$hadoop fs -copyFromLocal /home/cloudduggu/pig/tutorial/studentdata.txt /pigexample/

Now we will create relation "studentdata" and load data from HDFS to Pig.

Command:
grunt> studentdata = LOAD '/pigexample/studentdata.txt' USING PigStorage(',')
   as (studentid:int,firstname:chararray,lastname:chararray,city:chararray,gpa:int);


Now we will convert each record (studentid,firstname,lastname,city,gpa) into tuples and print output using the DUMP operator.

Command:
grunt> tobagdata = FOREACH studentdata GENERATE TOBAG (studentid,firstname,lastname,city,gpa);
grunt> DUMP tobagdata;

Output:
output of tuple command


TOTUPLE()

The TOTUPLE function is used to convert one or more expressions to a tuple.

Syntax:
grunt> TOTUPLE(expression [, expression ...])

We will use the relation “studentdata” which is created in the TOBAG section and convert each record (studentid,firstname,lastname,city,gpa) into tuples and print output using the DUMP operator.

Command:
grunt> totupeldata = FOREACH studentdata GENERATE TOTUPLE (studentid,firstname,lastname,city,gpa);
grunt> DUMP totupeldata;

Output:
output of tuple command