HDFS – Reading a File:
Reading is done in parallel (distributed) so it is very efficient. At client node a JVM will be running, the very first class which comes into picture is HDFS client. Let us see the reading process step by step.
- HDFS Client will send open request to Hadoop distributed file system.
- Now filesystem will directly interact with the Name node and gets the block level information.
- Name node check whether the client has access to read that file or not i.e. user authorization.
- Name node provides address of the data node and its block locations.
- Client will then send read request and interacts with the data node directly.
- The Actual reading is done in the slaves.
Suppose at any point of time of reading our data node goes down then don’t worry about it our name node will provide location of another data node for reading the block as the blocks are replicated. Here packets can be called as a subset of blocks.