Pig Latin – Filtering:
FOREACH – GENERATE:
In this example Pig will validate, but not execute, the LOAD and FOREACH statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); B = FOREACH A GENERATE name;
In this example, Pig will validate and then execute the LOAD, FOREACH, and DUMP statements.
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float); B = FOREACH A GENERATE name; X = FOREACH A GENERATE *; X = FOREACH B GENERATE group, A.name; X = FOREACH B GENERATE group, SUM (A.name);
DISTINCT:
Removes duplicate tuples in a relation
X = DISTINCT A;
FILTER:
Based on some condition it selects tuples from a relation.
X = FILTER A BY f3 == 3;
STREAM:
Sends data to an external script or program.
B = STREAM A THROUGH 'stream.pl -n 5';
SAMPLE:
Partitions a relation into two or more relations. there is no guarantee that the exact same number of tuples will be returned for a particular sample size each time the operator is used
X = SAMPLE A 0.01;