Tag Archives: HDFS

Spark – How does Apache Spark Work?

This blog represents concepts on how does apache spark work with the help of diagrams. Following are some of the key aspects in relation with Apache Spark which is described in this blog: Apache Spark – basic concepts Apache Spark with YARN & HDFS/HBase Apache Spark with Mesos & HDFS/HBase Apache Spark – Basic Concepts The following represents basic concepts in relation with Spark: Apache Spark with YARN & HBase/HDFS Following are some of the key architectural building blocks representing how does Apache Spark work with YARN and HDFS/HBase. Spark driver program runs on client node. YARN is used as cluster manager. As part of YARN setup, there would be multiple nodes running …

Continue reading

Posted in Big Data. Tagged with , , , .

Big Data – How Data is Retrieved and Written from/to HDFS?

This blog represents my notes on how data is read and written from/to HDFS. Please feel free to suggest if it is done otherwise. Following are steps using which clients retrieve data from HDFS: Clients ask Namenode for a file/data block Name-node returns data node information (ID) where the file/data blocks are located Client retrieves data directly from the data node. Following are steps in which data is written to HDFS: Clients ask Name-node that they want to write one or more data blocks pertaining to a file. Name-node returns data nodes information to which these data blocks needs to be written Clients write each data block to the data nodes suggested. The …

Continue reading

Posted in Big Data. Tagged with , , .