MapReduce – Components:
Split can be called as a logical representation of block. In map-reduce 1 mapper can process 1 split at a time.
We have seen in HDFS that the default size can be 64mb or 128mb, then if file size is 1280mb, block size is 128mb than we will have 10 splits, then 10 mappers will run for the input file.
We have seen in the HDFS that we have a client, master and slaves. Client configures the job and submits it to the master. We can say job as a program in which we execute mapper and reducer.
We can say Task as a sub-division of job. Here job is divided into smaller tasks. Master divides the work or job into multiple tasks and gives them to slaves. The actual work is done by the slaves.
Here we can also say that Client need to submit the job to Resource Manger which is running on Master, then Master converts that job and submits tasks to the slaves. These all tasks are run parallel and independent on each other.
It is a daemon which runs on Master node.
It is a daemon which runs on Slaves.