MapReduce Tutorial:

It is a programming model or a software framework invented by google for processing large amount of data in parallel on large clusters of commodity hardware. It usually divides the work into set of independent tasks which are processed by Map and Reduce tasks.

The Hadoop framework takes care of all the things like scheduling tasks, monitoring them and re-executing if any of the task fails.

Map-Reduce work always in the form of key-value pair and can be implemented in many languages like java, python.

It is the responsibility of the framework to convert the unstructured data into key-value pair. In Map-reduce input is in the form of list and map-reduce transforms that list of input data elements into list of output data elements.

This transformation is done by both Map and Reduce. It divides the work into smaller parts and executes them on different nodes in the cluster. First Map phase is done and next Reduce phase.

In Map reduce program Map function implements the mapper and Reduce function implements the reducer.

We will see this in the detailed analysis of Map-Reduce program.