HDFS – Writing a File:
Write operation is also done in a distributed way. Here Name node only tells the blocks need to be replicated on the machines. All the blocks are replicated among themselves. Writing is done in parallel means it does not write first 1st block, next 2nd block…it will be done parallel.
- To write a file into HDFS, first client will interact with the name node. Client sends a create request to name node.
- Name node checks whether the client have access or not.
- Name node will provide location of the slave and its block location, where client can start writing the data directly on data node.
- Client starts writing on the data node.
- Here our client will not send 3 copies, he sends only 1 copy. The 1 copy will be replicated among themselves.
- As soon as client finishes writing the block, the slave starts copying the block into another slave. Slave’s starts replicate the data by itself.
- After the required replicas are created, last node will send an acknowledgement to the client.
- As you can see in the diagram, Acknowledgement will be in reverse order i.e. 3rd data node will acknowledge to 2nd data node and then 2nd data node to 1st and at last 1st data node will acknowledgement to client.
All the data nodes are in constant communication with the name node. While writing if any data node goes down, no problem name node will provide the new address of data node where to write.