Big Data – How Data is Retrieved and Written from/to HDFS?

0

This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise.

Following are steps using which clients retrieve data from HDFS:

  1. Clients ask Namenode for a file/data block
  2. Name-node returns data node information (ID) where the file/data blocks are located
  3. Client retrieves data directly from the data node.

Following are steps in which data is written to HDFS:

  1. Clients ask Name-node that they want to write one or more data blocks pertaining to a file.
  2. Name-node returns data nodes information to which these data blocks needs to be written
  3. Clients write each data block to the data nodes suggested.
  4. The data nodes then replicates the data block to other data nodes
  5. Informs Namenode about the write.
  6. Name-node commits EditLog

Following diagrams represents the data is read/written from/to HDFS.

HDFS read-write architecture

 

 

 

 

 

 

 

 

 

 

 

Following depicts how files are written to HDFS.

writing_files_to_HDFS

Ajitesh Kumar

Ajitesh Kumar

Ajitesh is passionate about various different technologies including programming languages such as Java/JEE, Javascript, PHP, C/C++, mobile programming languages etc, and, computing fundamentals related with cloud-native technologies, application security, cloud computing platforms, mobile apps, big data etc.

He has also authored the book, Building Web Apps with Spring 5 and Angular.
Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.