Pig – Execution:
As we have seen in the architecture when the Pig Latin program is executed, each statement is parsed in turn. If there are any syntax errors or other problems, the interpreter will wait and display an error message.
The interpreter builds a logical plan for every relational operation performed in the program. Now we have got logical plan for the complete program. It is important to remember that no data processing takes place while the logical plan of the program is being constructed.
When the interpreter sees the first line containing the LOAD statement, it confirms that it is syntactically and semantically correct and adds it to the logical plan, but it does not load the data from the file or even check whether the file exists.
We do not need all the data since we are using many operators in the program, so when the full flow is defined than only the execution or processing of the program starts. Similarly, Pig validates the GROUP and FOREACH…GENERATE statements, and adds them to the logical plan without executing them.
The execution starts when the DUMP statement is issued. At this point the logical plan is compiled into a physical plan and executed. In local mode, the physical plan is a series of Map-Reduce jobs. Pig runs in the local JVM and in MapReduce mode Pig runs on a Hadoop cluster
- We can see the logical and physical plans created by Pig using the EXPLAIN command on a relation.
- EXPLAIN will also show the MapReduce plan, which shows how the physical operators are grouped into MapReduce jobs. This is a good way to find out how many MapReduce jobs Pig will run for your query.