Home / Pig Tutorial / Pig – Schema

Pig – Schema:

Schema assigns name to the field and declares data type of the field. It is optional in pig but it is recommended to use them for getting good results.

We have seen in the load function that we have defined datatypes for every field, using describe command we can see the schema.

In this example the LOAD statement includes a schema definition for simple data types.

A = LOAD 'student' AS (name:chararray, age:int, gpa:float);

DESCRIBE A;

A: {name: chararray,age: int,gpa: float}

In this example the FOREACH statement includes FLATTEN and a schema for simple data types.

X = FOREACH C GENERATE FLATTEN(B) AS (f1:int, f2:int, f3:int);

In this example the schema defines one tuple. The load statements are equivalent.

A = LOAD 'data' AS (T: tuple (f1:int, f2:int, f3:int));

A = LOAD 'data' AS (T: (f1:int, f2:int, f3:int));

DESCRIBE A;

A: {T: (f1: int,f2: int,f3: int)}