/    /  Pig Latin – Filtering

Pig Latin – Filtering:

FOREACH – GENERATE:

In this example Pig will validate, but not execute, the LOAD and FOREACH statements.

A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);

B = FOREACH A GENERATE name;

In this example, Pig will validate and then execute the LOAD, FOREACH, and DUMP statements.

A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);

B = FOREACH A GENERATE name;

X = FOREACH A GENERATE *;

X = FOREACH B GENERATE group, A.name;

X = FOREACH B GENERATE group, SUM (A.name);
DISTINCT:

Removes duplicate tuples in a relation

X = DISTINCT A;

 

FILTER:

Based on some condition it selects tuples from a relation.

X = FILTER A BY f3 == 3;

 

STREAM:

Sends data to an external script or program.

B = STREAM A THROUGH 'stream.pl -n 5';
SAMPLE:

Partitions a relation into two or more relations. there is no guarantee that the exact same number of tuples will be returned for a particular sample size each time the operator is used

X = SAMPLE A 0.01;