This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise.
Following are steps using which clients retrieve data from HDFS:
- Clients ask Namenode for a file/data block
- Name-node returns data node information (ID) where the file/data blocks are located
- Client retrieves data directly from the data node.
Following are steps in which data is written to HDFS:
- Clients ask Name-node that they want to write one or more data blocks pertaining to a file.
- Name-node returns data nodes information to which these data blocks needs to be written
- Clients write each data block to the data nodes suggested.
- The data nodes then replicates the data block to other data nodes
- Informs Namenode about the write.
- Name-node commits EditLog
Following diagrams represents the data is read/written from/to HDFS.
Following depicts how files are written to HDFS.
Ajitesh Kumar
Ajitesh is passionate about various different technologies including programming languages such as Java/JEE, Javascript, PHP, C/C++, mobile programming languages etc, and, computing fundamentals related with cloud-native technologies, application security, cloud computing platforms, mobile apps, big data etc.
He has also authored the book, Building Web Apps with Spring 5 and Angular.
He has also authored the book, Building Web Apps with Spring 5 and Angular.
Latest posts by Ajitesh Kumar (see all)
- Google Recaptcha with Angular and Spring App – Example - April 22, 2018
- HTTP 403 Error Solution – Access-Control-Allow-Origin - April 10, 2018
- Selenium Interview Questions and Answers – Set 1 - April 3, 2018