Categories: Big Data

Big Data – How Data is Retrieved and Written from/to HDFS?

This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise.

Following are steps using which clients retrieve data from HDFS:

  1. Clients ask Namenode for a file/data block
  2. Name-node returns data node information (ID) where the file/data blocks are located
  3. Client retrieves data directly from the data node.

Following are steps in which data is written to HDFS:

  1. Clients ask Name-node that they want to write one or more data blocks pertaining to a file.
  2. Name-node returns data nodes information to which these data blocks needs to be written
  3. Clients write each data block to the data nodes suggested.
  4. The data nodes then replicates the data block to other data nodes
  5. Informs Namenode about the write.
  6. Name-node commits EditLog

Following diagrams represents the data is read/written from/to HDFS.

 

 

 

 

 

 

 

 

 

 

 

Following depicts how files are written to HDFS.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Mathematics Topics for Machine Learning Beginners

In this blog, you would get to know the essential mathematical topics you need to…

6 days ago

Questions to Ask When Thinking Like a Product Leader

This blog represents a list of questions you can ask when thinking like a product…

1 week ago

Three Approaches to Creating AI Agents: Code Examples

AI agents are autonomous systems combining three core components: a reasoning engine (powered by LLM),…

2 weeks ago

What is Embodied AI? Explained with Examples

Artificial Intelligence (AI) has evolved significantly, from its early days of symbolic reasoning to the…

2 months ago

Retrieval Augmented Generation (RAG) & LLM: Examples

Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…

5 months ago

How to Setup MEAN App with LangChain.js

Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…

5 months ago