HDFS – Nodes:
There are two types of nodes in HDFS.
- Master Node
- Slave Node.
We can call it as Master node or Name Node. For an HDFS cluster, we have single Name node and multiple Data nodes. Name node is the daemon which runs on master so we can call it as Master node or Name node.
Name node stores metadata (data about data) in memory for fast retrieval. Master manages file system namespace where all the metadata is stored and regulates access to files by client.
Till Hadoop 1 we have single master only so it was a single point of failure, from hadoop2 we can have multiple masters and standby masters. We should deploy master on reliable hardware not on commodity hardware.
We can call it as Data node also. Data node does the task given by the Name node. It is generally deployed on slaves. We have many data nodes in a cluster.
The tasks done by data node are reading, writing, processing, replication of blocks, block creation, etc. that means the whole actual work is done by the data node on the slaves. We can deploy slave on commodity hardware.
So, let us now see the process of working of both master and slave node.
Whenever any client wants to interact with the HDFS file-system like read or write data, first they must interact with Master Node. The Master Node manages all the slaves’ nodes and assigns work to them. All the work is done by Slave node.